Benchmarking · Consumer GPUs · Open Data

LocoBench

I have X GB of VRAM -- what's the best model I can run? LocoBench benchmarks LLM inference across every consumer GPU VRAM tier, from 2 GB to 24 GB. The floor, not the ceiling.

See the Results GitHub
VRAM Tiers

Every tier. Floor cards. Honest baselines.

Each VRAM tier is benchmarked on the worst-in-class GPU for that tier. If it runs here, it runs on your card.

4 GB
GTX 1050 Ti
112 GB/s · Pascal
6 GB
GTX 1060
192 GB/s · Pascal
8 GB
RTX 2060 Super
448 GB/s · Turing
12 GB
RTX 3060
360 GB/s · Ampere
24 GB
RTX 3090
936 GB/s · Reference ceiling

Nobody is answering this honestly.

Within my VRAM budget, which combination of model size and precision gives the best results?

A BF16 SmolLM2-1.7B and a Q4_K_M Qwen3-4B both fit in 4 GB. Which one actually wins? A full-precision 3B model and a quantized 7B model both fit in 8 GB. Which should you run?

Most benchmarks compare models under ideal conditions. LocoBench compares everything that fits within your actual hardware constraint -- full-precision small models against quantized larger models, head to head.

Part of LocoLab -- frontier AI on a budget.