I have X GB of VRAM -- what's the best model I can run? LocoBench benchmarks LLM inference across every consumer GPU VRAM tier, from 2 GB to 24 GB. The floor, not the ceiling.
Each VRAM tier is benchmarked on the worst-in-class GPU for that tier. If it runs here, it runs on your card.
Within my VRAM budget, which combination of model size and precision gives the best results?
A BF16 SmolLM2-1.7B and a Q4_K_M Qwen3-4B both fit in 4 GB. Which one actually wins? A full-precision 3B model and a quantized 7B model both fit in 8 GB. Which should you run?
Most benchmarks compare models under ideal conditions. LocoBench compares everything that fits within your actual hardware constraint -- full-precision small models against quantized larger models, head to head.
Part of LocoLab -- frontier AI on a budget.