llamaperf

Llama 3.1

Meta · 1 report

Thin page (1 of 3 reports needed for indexing). Add yours.

Llama 3.1 405B

Unknown GPU

Tone: positive
throughput:
1.2 t/s gen
quant:
Q4_K_M (gguf)

Post references running Llama 405b Q4_K_M at 1.2 t/s 2 years ago, and contrasts with current speeds of 30-100 t/s for newer models like Kimi K2.6, DeepSeek V4 Flash, MiniMax 2.7, Step 3.5 Flash, Qwen3.5-397B. Also mentions running Qwen3.6-36B at 50 t/s for a few hundred dollars.