Llama 3.1 405B
Unknown GPU
- throughput:
- 1.2 t/s gen
- quant:
- Q4_K_M (gguf)
Post references running Llama 405b Q4_K_M at 1.2 t/s 2 years ago, and contrasts with current speeds of 30-100 t/s for newer models like Kimi K2.6, DeepSeek V4 Flash, MiniMax 2.7, Step 3.5 Flash, Qwen3.5-397B. Also mentions running Qwen3.6-36B at 50 t/s for a few hundred dollars.