Qwen3.6 27B
RTX 5060 Ti 16GB · llama.cpp · 75,000 ctx
- throughput:
- 22.0 t/s gen · 760.0 t/s pp
- quant:
- IQ4_XS (gguf)
- kv:
- Q8
- flash attention:
- on
User tested Qwen3.6 27B IQ4_XS on RTX 5060 Ti 16GB with llama.cpp (TheTom's TurboQuant fork). Prompt processing 760 t/s, generation 22 t/s. Context window limited to 75k. KV cache quant turbo4/turbo2. Also tested BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS, Q3_K_XL, Q3_K_M, Q2_K_XL on L40S or RTX 5060 Ti. Quality comparison using chess board SVG generation task. Recommends IQ4_XS as minimum.