llamaperf

AMD MI50 32GB

AMD · 32GB · 2 reports

See what fits on this GPU →
This page is thin (2 of 3 reports needed for indexing). Help fill it in.

Qwen3.6 27B

AMD MI50 32GB · llama.cpp

quant:
Q8_0 (gguf)
kv:
Q8
mtp (multi-token prediction):
on

Benchmark comparing MTP KV cache quantization (Q8_0) vs no quantization on Qwen3.6-27B-Q8_0 with llama.cpp. Results show negligible wall time difference (~0.14s) with quantized draft KV cache. Also tested with tensor parallelism on 2xMI50 32GB. Aggregate accept rate ~0.735-0.741.

throughput:
9.7 t/s gen · 264.0 t/s pp

32x AMD MI50 32GB GPUs. Prompt processing 264 t/s, generation 9.7 t/s. Model: Kimi K2.6.