RTX 5060 Ti 16GB

Name: RTX 5060 Ti 16GB local LLM performance reports
Creator: llamaperf
License: https://creativecommons.org/licenses/by/4.0/

NVIDIA · 16GB · 1 report

See what fits on this GPU →

This page is thin (1 of 3 reports needed for indexing). Help fill it in.

Latest Most reported Fastest t/s

Qwen3.6 27B

RTX 5060 Ti 16GB · llama.cpp · 75,000 ctx

throughput:: 22.0 t/s gen · 760.0 t/s pp
quant:: IQ4_XS (gguf)
kv:: Q8
flash attention:: on

User tested Qwen3.6 27B IQ4_XS on RTX 5060 Ti 16GB with llama.cpp (TheTom's TurboQuant fork). Prompt processing 760 t/s, generation 22 t/s. Context window limited to 75k. KV cache quant turbo4/turbo2. Also tested BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS, Q3_K_XL, Q3_K_M, Q2_K_XL on L40S or RTX 5060 Ti. Quality comparison using chess board SVG generation task. Recommends IQ4_XS as minimum.