So I've got a 7800 XT 16GB, with 32GB of RAM, an a i5 10 gen, on a Windows PC, and I want to set it up as a local LLM server with LM Studio to plug into OpenCode for coding work, Mostly programming, refactoring, commenting code, that kind of thing.
Problem is there's been an absolute flood of models in the last few weeks and I'm a bit lost
I wanted a model that was capable but also not slow.
I've already tried some models, including Qwen 3.5 35b a3b (4q), where I only get 13 tps, which is severely slow for agent use, as is the case with OpenCode
I tried Qwen 3.5 9b, 50 tps, not bad, but unfortunately it doesn't handle requests very well after 1 or 2 prompts, and doesn't even try to do what was requested
I tried these models with ROCm
What choices do I have?