Gemma 4 26B assistant
M5 Max 64GB · llama.cpp
- throughput:
- 97.0 t/s gen
coding
Multi-Token Prediction (MTP) implementation yields 40% speedup (138 t/s with MTP).
APPLE · 64GB unified memory · 3 reports
M5 Max 64GB · llama.cpp
Multi-Token Prediction (MTP) implementation yields 40% speedup (138 t/s with MTP).
M5 Max 64GB · MLX
MTPLX engine achieves 63 tok/s on Qwen3.6-27B 4-bit MLX on M5 Max 64GB, up from 28 tok/s baseline. Uses native MTP heads with temperature 0.6, top_p 0.95, top_k 20. Optimal depth D3. Custom patched MLX fork with Metal kernels.
Qwen 3.6 27B on MacBook Pro M5 Max 64GB: 32 tokens/sec, 18m04s, 33946 tokens. Compared to Gemma 4 31B (27 t/s, 3m51s, 6209 tokens). Qwen showed more creativity but Gemma won for game logic.