Qwen3.6 27B
2× RX 9070 · llama.cpp · 131,072 ctx
- throughput:
- 46.9 t/s gen · 398.4 t/s pp
- quant:
- UD-Q5_K_XL (gguf)
- flash attention:
- on
- mtp (multi-token prediction):
- on
codingagentic
User runs two RX 9070 XTs with ROCm, uses MTP (spec-type = draft-mtp, spec-draft-n-max = 2). Prompt t/s varies; generation t/s around 45-52. Draft acceptance rate ~0.8-0.99. User praises speed, smarts, steerability for agentic coding tasks. Quant is UD-Q5_K_XL (unsloth GGUF).