Qwen3.6 27B
M2 Max 96GB · llama.cpp · 256,000 ctx
- throughput:
- 8.0 t/s gen
- quant:
- F16 (gguf)
- mtp (multi-token prediction):
- on
codingagentic
User benchmarked Qwen 3.6 27b F16 on M2 Max 96GB using llama.cpp with MTP speculative decoding. Generation speed varied 8-18 tok/s depending on task; without MTP got 6.6 tok/s. Used for agentic coding to create a Pacman game. Also tested Q8 quant but results were worse. Context up to 150k+ tokens usable. Chat template fixes were critical.