M2 Max 96GB

APPLE · 96GB unified memory · 2 reports

See what fits on this GPU →

This page is thin (2 of 3 reports needed for indexing). Help fill it in.

Latest Most reported Fastest t/s

Qwen3.6 27B

M2 Max 96GB · llama.cpp · 256,000 ctx

throughput:: 8.0 t/s gen
quant:: F16 (gguf)
mtp (multi-token prediction):: on

codingagentic

User benchmarked Qwen 3.6 27b F16 on M2 Max 96GB using llama.cpp with MTP speculative decoding. Generation speed varied 8-18 tok/s depending on task; without MTP got 6.6 tok/s. Used for agentic coding to create a Pacman game. Also tested Q8 quant but results were worse. Context up to 150k+ tokens usable. Chat template fixes were critical.

Qwen3.6 27B

M2 Max 96GB · llama.cpp · 262,144 ctx

throughput:: 28.0 t/s gen
quant:: Q5_K_M (gguf)
kv:: Q4

codingagentic

MTP speculative decoding gives 2.5x speedup. Tested on M2 Max 96GB with Q5_K_M quant and q4_0 KV cache. Also provides hardware recommendations for various Apple Silicon and NVIDIA GPUs.