llamaperf

RTX 3060 12GB

NVIDIA · 12GB · 5 reports

See what fits on this GPU →

Qwen3.6 27B

RTX 3060 12GB · llama.cpp · 64,000 ctx

Tone: positive
throughput:
43.3 t/s gen · 456.1 t/s pp
quant:
Q4_K_S (gguf)

Dual RTX 3060 setup with tensor parallel. MTP enabled. Context 64k. Prefill 456 t/s, generation 43.26 t/s at 12k context. Without MTP, context 96k, generation 31 t/s. User praises value and stability of CUDA.

Qwen3.6 27B unsloth

RTX 3060 12GB · llama.cpp · 32,000 ctx

Tone: positive
throughput:
70.0 t/s gen · 780.0 t/s pp
quant:
Q2-XS (gguf)
kv:
Q8
flash attention:
off
rating:
4/5
codingcreative-writingtool-usesummarizationvisionagenticmultilingual

defnitly want to try an higher qwant. I vé took that one beacause gguf sise 11,9 go, ans barely offload this dense modèle to cpu

Qwen3.6 35B (3B active)

RTX 3060 12GB · llama.cpp · 32,768 ctx

Tone: positive
throughput:
46.8 t/s gen · 914.0 t/s pp
quant:
IQ4_XS (gguf)
kv:
Q8
flash attention:
on
coding

Best plain llama-bench: pp512 ~914 t/s, tg128 ~46.8 t/s. Practical coding profile: 32k context, ~43.4 t/s generation. MTP gave ~47.7 t/s (2% improvement).