Qwen3.6 27B
2× RTX 3060 12GB · llama.cpp · 64,000 ctx
- throughput:
- 43.3 t/s gen · 456.1 t/s pp
- quant:
- Q4_K_S (gguf)
Dual RTX 3060 setup with tensor parallel. MTP enabled. Context 64k. Prefill 456 t/s, generation 43.26 t/s at 12k context. Without MTP, context 96k, generation 31 t/s. User praises value and stability of CUDA.