Llama 3.1

Name: Llama 3.1 local performance reports
Creator: llamaperf
License: https://creativecommons.org/licenses/by/4.0/

Meta · 1 report

Thin page (1 of 3 reports needed for indexing). Add yours.

Llama 3.1 405B

Unknown GPU

throughput:: 1.2 t/s gen
quant:: Q4_K_M (gguf)

Post references running Llama 405b Q4_K_M at 1.2 t/s 2 years ago, and contrasts with current speeds of 30-100 t/s for newer models like Kimi K2.6, DeepSeek V4 Flash, MiniMax 2.7, Step 3.5 Flash, Qwen3.5-397B. Also mentions running Qwen3.6-36B at 50 t/s for a few hundred dollars.