Qwen3.6 27B
2× AMD MI50 32GB · llama.cpp
- quant:
- Q8_0 (gguf)
- kv:
- Q8
- mtp (multi-token prediction):
- on
Benchmark comparing MTP KV cache quantization (Q8_0) vs no quantization on Qwen3.6-27B-Q8_0 with llama.cpp. Results show negligible wall time difference (~0.14s) with quantized draft KV cache. Also tested with tensor parallelism on 2xMI50 32GB. Aggregate accept rate ~0.735-0.741.