Gemma 4 5.1B E2B Instruct
RTX A6000 48GB · llama.cpp
- throughput:
- 16.9 t/s gen
- quant:
- bf16 (safetensors)
text-generation
bf16 no quantization. 10.25GB VRAM, 61ms TTFT. Source: dev.to Gaurav Vij
NVIDIA · 48GB · 1 report
RTX A6000 48GB · llama.cpp
bf16 no quantization. 10.25GB VRAM, 61ms TTFT. Source: dev.to Gaurav Vij