Gemma 4 8B E4B Instruct
RTX 4070 · Ollama
- throughput:
- 55.0 t/s gen
- quant:
- Q4_K_M (gguf)
text-generation
~55 tok/s on RTX 4070 12GB. Ada Lovelace efficiency. Source: estimated from compute-market tiers
NVIDIA · 12GB · 1 report
RTX 4070 · Ollama
~55 tok/s on RTX 4070 12GB. Ada Lovelace efficiency. Source: estimated from compute-market tiers