RTX 3060 12GB

NVIDIA · 12GB · 5 reports

Qwen3.6 27B

2× RTX 3060 12GB · llama.cpp · 64,000 ctx

throughput:: 43.3 t/s gen · 456.1 t/s pp
quant:: Q4_K_S (gguf)

Dual RTX 3060 setup with tensor parallel. MTP enabled. Context 64k. Prefill 456 t/s, generation 43.26 t/s at 12k context. Without MTP, context 96k, generation 31 t/s. User praises value and stability of CUDA.

Qwen3.6 27B unsloth

RTX 3060 12GB · llama.cpp · 32,000 ctx

throughput:: 70.0 t/s gen · 780.0 t/s pp
quant:: Q2-XS (gguf)
kv:: Q8
flash attention:: off
rating:: 4/5

codingcreative-writingtool-usesummarizationvisionagenticmultilingual

defnitly want to try an higher qwant. I vé took that one beacause gguf sise 11,9 go, ans barely offload this dense modèle to cpu

Qwen3.6 35B (3B active)

RTX 3060 12GB · llama.cpp · 32,768 ctx

throughput:: 46.8 t/s gen · 914.0 t/s pp
quant:: IQ4_XS (gguf)
kv:: Q8
flash attention:: on

coding

Best plain llama-bench: pp512 ~914 t/s, tg128 ~46.8 t/s. Practical coding profile: 32k context, ~43.4 t/s generation. MTP gave ~47.7 t/s (2% improvement).

Gemma 4 5.1B E2B Instruct

RTX 3060 12GB · Ollama

throughput:: 60.0 t/s gen
quant:: Q4_K_M (gguf)

text-generation

~60 tok/s on RTX 3060 12GB. E2B runs effortlessly. Source: estimated from compute-market tiers

Gemma 4 8B E4B Instruct

RTX 3060 12GB · Ollama

throughput:: 45.0 t/s gen
quant:: Q4_K_M (gguf)

text-generation

~45 tok/s on RTX 3060 12GB. E4B fits easily. Source: compute-market.com