RTX 5080

Name: RTX 5080 local LLM performance reports
Creator: llamaperf
License: https://creativecommons.org/licenses/by/4.0/

NVIDIA · 16GB · 1 report

See what fits on this GPU →

This page is thin (1 of 3 reports needed for indexing). Help fill it in.

Latest Most reported Fastest t/s

Qwen3.6 35B (3B active)

RTX 5080 · llama.cpp · 131,072 ctx

throughput:: 56.0 t/s gen · 1584.0 t/s pp
quant:: Q4_K_XL (gguf)
kv:: Q8
flash attention:: on
mtp (multi-token prediction):: off

codingagentic

Best config for 35B Q4_K_XL at 128k context: no MTP, --fit-target 1536. MTP doesn't help at 128k. 27B IQ3 fits fully on GPU and benefits from MTP (73 tok/s).