RX 9070

Name: RX 9070 local LLM performance reports
Creator: llamaperf
License: https://creativecommons.org/licenses/by/4.0/

AMD · 16GB · 1 report

See what fits on this GPU →

This page is thin (1 of 3 reports needed for indexing). Help fill it in.

Latest Most reported Fastest t/s

Qwen3.6 27B

2× RX 9070 · llama.cpp · 131,072 ctx

throughput:: 46.9 t/s gen · 398.4 t/s pp
quant:: UD-Q5_K_XL (gguf)
flash attention:: on
mtp (multi-token prediction):: on

codingagentic

User runs two RX 9070 XTs with ROCm, uses MTP (spec-type = draft-mtp, spec-draft-n-max = 2). Prompt t/s varies; generation t/s around 45-52. Draft acceptance rate ~0.8-0.99. User praises speed, smarts, steerability for agentic coding tasks. Quant is UD-Q5_K_XL (unsloth GGUF).