llamaperf

VRAM Calculator

Pick your GPU — see which open-weight models fit in VRAM, at which quantization, and roughly how fast they run. Tokens/sec comes from community-reported benchmarks when we have them.

3 fits fully·1 with offload·0 won't run
All 4 reports for this GPU →
Fits
Gemma 3
27B params·Google·FP16
61.0 GB
Fits
Qwen3.5
27B params·Alibaba·FP16
60.5 GB
Fits
Qwen3.6
3B / 35B·Alibaba·FP16
11.9 GB
Offload
Gemma 4
30.7B params·Google DeepMind·FP16
68.7 GB