Qwen3.6 35B (3B active)
RTX 4070 Ti Super · ik_llama.cpp · 131,072 ctx
- throughput:
- 110.2 t/s gen
- quant:
- IQ4_XS-4.19bpw (gguf)
- kv:
- Q8
- mtp (multi-token prediction):
- on
codingsummarizationmath
Benchmark comparing llama.cpp (89.76 t/s) vs ik_llama.cpp (110.24 t/s) with MTP on Qwen3.6-35B-A3B IQ4_XS quant. 23% speed increase. CPU: Ryzen 7 9700X, OS: CachyOS. GPU used as secondary with iGPU for display.