llamaperf

RTX 5080

NVIDIA · 16GB · 1 report

See what fits on this GPU →
This page is thin (1 of 3 reports needed for indexing). Help fill it in.

Qwen3.6 35B (3B active)

RTX 5080 · llama.cpp · 131,072 ctx

throughput:
56.0 t/s gen · 1584.0 t/s pp
quant:
Q4_K_XL (gguf)
kv:
Q8
flash attention:
on
mtp (multi-token prediction):
off
codingagentic

Best config for 35B Q4_K_XL at 128k context: no MTP, --fit-target 1536. MTP doesn't help at 128k. 27B IQ3 fits fully on GPU and benefits from MTP (73 tok/s).