Benchmarking · Affordable GPUs · Open Data

LocoBench

I have X GB of VRAM -- what's the best model I can run? LocoBench benchmarks LLM inference across every GPU VRAM tier, from 2 GB to 32 GB. Consumer cards, affordable server GPUs, and everything in between.

See the Results GitHub
VRAM Tiers

Every tier. Floor cards. Honest baselines.

Each VRAM tier is benchmarked on the worst-in-class GPU for that tier. If it runs here, it runs on your card.

2 GB
  • GTX 950 · 105 GB/s
3 GB
  • GTX 1060 3GB · 192 GB/s
4 GB
  • GTX 1050 Ti · 112 GB/s
  • GTX 960 · 112 GB/s
6 GB
  • GTX 1060 6GB · 192 GB/s
  • GTX 980 Ti · 336 GB/s
8 GB
  • RTX 2060 Super · 448 GB/s
  • GTX 1070 · 256 GB/s
  • RTX 3050 · 224 GB/s
12 GB
  • RTX 3060 · 360 GB/s
  • GTX Titan X · 336 GB/s
  • 2× GTX 1060 · pooled
16 GB
  • RTX 4060 Ti · 288 GB/s
  • Tesla P100 · 732 GB/s
  • Tesla V100 · 900 GB/s
  • 2× RTX 2060 Super · pooled
18 GB
  • 3× GTX 1060 · pooled
24 GB
  • RTX 3090 · 936 GB/s
  • 3× RTX 2060 Super · pooled
32 GB
  • Tesla V100 · 900 GB/s
  • 4× RTX 2060 Super · pooled

Nobody is answering this honestly.

Within my VRAM budget, which combination of model size and precision gives the best results?

A BF16 SmolLM2-1.7B and a Q4_K_M Qwen3-4B both fit in 4 GB. Which one actually wins? A full-precision 3B model and a quantized 7B model both fit in 8 GB. Which should you run?

Most benchmarks compare models under ideal conditions. LocoBench compares everything that fits within your actual hardware constraint -- full-precision small models against quantized larger models, head to head.

Part of LocoLab -- frontier AI on a budget.

Part of LocoLab

Six projects. Three layers. One lab.

⛩️ LocoPuente