llamaperf

RTX Pro 6000 Blackwell

NVIDIA · 96GB · 3 reports

See what fits on this GPU →
throughput:
3500.0 t/s gen · 30000.0 t/s pp

Two benchmarks: Qwen3.6 27B BF16 and Qwen3.6 35B BF16. For 35B, best gen tps 3500 at 128 concurrency with MTP off, prompt tps 30000. Also tested 27B with MTP on/off.

Tone: positive

FastDMS implementation of DMS KV-cache compression. Benchmarks show 1.5-2x faster decoding than vLLM BF16/FP8 with 5-8x less KV memory. Quality metrics (KLD, token match) comparable or better than vLLM's FP8/TurboQuant. Tested on Llama-3.2-1B and Qwen3-8B DMS checkpoints. Training took ~20 min on RTX Pro 6000 Blackwell.

Tone: mixed
codingsummarization

Compared Qwen3.6-27B (with and without thinking) against Coder-Next. 27B with thinking disabled was most consistent (95.8% ship rate). 27B and Coder-Next statistically tied overall. Also mentions Qwen3.6-35B-A3B performed poorly and was dropped.