llamaperf

VRAM Calculator

Pick your GPU — see which open-weight models fit in VRAM, at which quantization, and roughly how fast they run. Tokens/sec comes from community-reported benchmarks when we have them.

1 fits fully·3 with offload·0 won't run
All 4 reports for this GPU →
Fits
Qwen3.6
3B / 35B·Alibaba·FP16
11.9 GB
72.9 t/s
Offload
Gemma 4
30.7B params·Google DeepMind·Q8_0
38.0 GB
Offload
Gemma 3
27B params·Google·Q8_0
34.0 GB
Offload
Qwen3.5
27B params·Alibaba·Q8_0
33.5 GB