llamaperf

RTX 4070 Ti Super

NVIDIA · 16GB · 1 report

See what fits on this GPU →
This page is thin (1 of 3 reports needed for indexing). Help fill it in.

Qwen3.6 35B (3B active)

RTX 4070 Ti Super · ik_llama.cpp · 131,072 ctx

Tone: positive
throughput:
110.2 t/s gen
quant:
IQ4_XS-4.19bpw (gguf)
kv:
Q8
mtp (multi-token prediction):
on
codingsummarizationmath

Benchmark comparing llama.cpp (89.76 t/s) vs ik_llama.cpp (110.24 t/s) with MTP on Qwen3.6-35B-A3B IQ4_XS quant. 23% speed increase. CPU: Ryzen 7 9700X, OS: CachyOS. GPU used as secondary with iGPU for display.