Llama 3.3

Name: Llama 3.3 local performance reports
Creator: llamaperf
License: https://creativecommons.org/licenses/by/4.0/

Meta · 1 report

Thin page (1 of 3 reports needed for indexing). Add yours.

Llama 3.3 1B

RTX Pro 6000 Blackwell · vLLM · 8,192 ctx

FastDMS implementation of DMS KV-cache compression. Benchmarks show 1.5-2x faster decoding than vLLM BF16/FP8 with 5-8x less KV memory. Quality metrics (KLD, token match) comparable or better than vLLM's FP8/TurboQuant. Tested on Llama-3.2-1B and Qwen3-8B DMS checkpoints. Training took ~20 min on RTX Pro 6000 Blackwell.