llamaperf

M2 Max 96GB

APPLE · 96GB unified memory · 2 reports

See what fits on this GPU →
This page is thin (2 of 3 reports needed for indexing). Help fill it in.

Qwen3.6 27B

M2 Max 96GB · llama.cpp · 256,000 ctx

Tone: positive
throughput:
8.0 t/s gen
quant:
F16 (gguf)
mtp (multi-token prediction):
on
codingagentic

User benchmarked Qwen 3.6 27b F16 on M2 Max 96GB using llama.cpp with MTP speculative decoding. Generation speed varied 8-18 tok/s depending on task; without MTP got 6.6 tok/s. Used for agentic coding to create a Pacman game. Also tested Q8 quant but results were worse. Context up to 150k+ tokens usable. Chat template fixes were critical.

Qwen3.6 27B

M2 Max 96GB · llama.cpp · 262,144 ctx

Tone: positive
throughput:
28.0 t/s gen
quant:
Q5_K_M (gguf)
kv:
Q4
codingagentic

MTP speculative decoding gives 2.5x speedup. Tested on M2 Max 96GB with Q5_K_M quant and q4_0 KV cache. Also provides hardware recommendations for various Apple Silicon and NVIDIA GPUs.