llamaperf

RTX 5060 Ti 16GB

NVIDIA · 16GB · 1 report

See what fits on this GPU →
This page is thin (1 of 3 reports needed for indexing). Help fill it in.

Qwen3.6 27B

RTX 5060 Ti 16GB · llama.cpp · 75,000 ctx

Tone: positive
throughput:
22.0 t/s gen · 760.0 t/s pp
quant:
IQ4_XS (gguf)
kv:
Q8
flash attention:
on

User tested Qwen3.6 27B IQ4_XS on RTX 5060 Ti 16GB with llama.cpp (TheTom's TurboQuant fork). Prompt processing 760 t/s, generation 22 t/s. Context window limited to 75k. KV cache quant turbo4/turbo2. Also tested BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS, Q3_K_XL, Q3_K_M, Q2_K_XL on L40S or RTX 5060 Ti. Quality comparison using chess board SVG generation task. Recommends IQ4_XS as minimum.