VRAM Calculator

Pick your GPU — see which open-weight models fit in VRAM, at which quantization, and roughly how fast they run. Tokens/sec comes from community-reported benchmarks when we have them.

GPU

System RAM

Context length

Quality

1 fits fully·3 with offload·0 won't run

All 4 reports for this GPU →

Fits

Qwen3.6

3B / 35B·Alibaba·FP16

11.9 GB

72.9 t/s

Offload

Gemma 4

30.7B params·Google DeepMind·Q8_0

38.0 GB

—

Offload

Gemma 3

27B params·Google·Q8_0

34.0 GB

—

Offload

Qwen3.5

27B params·Alibaba·Q8_0

33.5 GB

—