Qwen3.6 27B
AMD Strix Halo 128GB · llama.cpp
- throughput:
- 21.2 t/s gen
- quant:
- Q4_K_M (gguf)
- mtp (multi-token prediction):
- on
MTP enabled with --spec-type draft-mtp --spec-draft-n-max 3. Baseline without MTP: 11.7 tok/s. Also tested Q8_0: 7.4 → 18.1 tok/s (2.44×).