M5 Max 128GB

APPLE · 128GB unified memory · 5 reports

Gemma 4 31B

throughput:: 7.5 t/s gen
flash attention:: on

User reports poor performance with Gemma 4 (7.5 tok/s) and Qwen3.6-27B (locking up), while Qwen3.6-35B-A3 is fast. Suspects a bug with dense models.

Qwen3.6 27B

M5 Max 128GB · MLX · 290,000 ctx

throughput:: 5.5 t/s gen · 160.0 t/s pp
quant:: Q8 (mlx)

long-context

User reports 160 tok/s prefill, 5-6 tok/s generation (later 4-5 tok/s) on M5 Max 128GB with Qwen 3.6 27B Q8 MLX at 290k context. GPU utilization 36-50%. User feels performance is lower than expected and seeks comparison.

Gemma 4 31B

M5 Max 128GB

throughput:: 7.5 t/s gen
flash attention:: on

User reports poor performance with dense models (Gemma4-31B ~7.5 t/s, Qwen3.6-27B locking up) on M5 Max 128GB, while Qwen3.6-35B-A3B MoE is fast. Mentions using DFLASH (likely flash attention).

Qwen3.6 27B

M5 Max 128GB · MLX · 290,000 ctx

throughput:: 5.5 t/s gen · 160.0 t/s pp
quant:: Q8 (mlx)

long-context

User reports 160 tok/s prefill and 5-6 tok/s generation on M5 Max 128GB with Qwen 3.6 27B Q8 MLX at 290k context. GPU utilization only 36-50%, feels off compared to expected 8-14 tok/s generation. Seeking comparison from others.

Gemma 4 31B

M5 Max 128GB

throughput:: 7.5 t/s gen
flash attention:: on

User reports poor performance with Gemma4-31B (7.5 tok/s) and Qwen3.6-27B (locking up) on M5 Max 128GB, while Qwen3.6-35B-A3 is fast. Mentions using DFLASH.