Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
nanoVLM: Minimal PyTorch VLM
Build a Vision‑Language Model in pure PyTorch under 750 lines, train for six hours on a single H100, and run it on free Google Colab.
I’ll be presenting nanoVLM, a minimal, open-source PyTorch library for training Vision-Language Models (VLMs) from scratch in just ~750 lines of code. Inspired by nanoGPT, nanoVLM is simple, readable, and efficient — achieving competitive performance (35.3% on MMStar) with just 6 hours of training on a single H100 GPU. It combines a SigLiP-ViT encoder and LLaMA-style decoder, and is light enough to run in a free Google Colab.
Lightweight PyTorch repository for finetuning small VLMs using SigLIP/SmolLM2 backbones.