u/darkeagle03

▲ 5 r/Vllm

Request for vllm settings / setup for using with Claude Code on 16 GB VRAM + 32 GB RAM

Wondering if anyone here can help me out with any settings / starting points / a reality check.

My goal is to use Claude Code for some hobby apps primarily using local LLMs on my 3080 16gb with 32 GB RAM & Windows 11. Does anyone have a similar setup working relatively smoothly with tool calling on similar specs?

----

I got CC working with oLLama easily, but it's very slow. I was told vllm might work better, and I managed to get a vllm + LiteLLM setup running, but I'm struggling to get it to work with tool calls without being even slower than ollama. It's OK-ish without the tool calling, but that doesn't work for what I want to do. I feel like there might be some settings tweaking I can do to get it to work, but I've tried a bunch of things and no joy yet. I don't have a lot more time to stumble through setup stuff, which is why I'm reaching out.

I know my machine will never be fast, and I'll struggle to run even a model above about 12B (though oLLama seems to handle it). I'm not concerned about that. I just need it to be a little faster than 2+ hours to create CRUD stored procedures for 4 tables. Most of what I want can probably be handled by a basic < 12B model.

I'm cool with moving away from Claude Code for something lighter weight (maybe Pi?), but I need something with similar code management, tool management, and execution capabilities. I also don't have much time to play around with setup or building out capabilities or custom guides, skills, personalities, etc. to get to relatively basic functionality of managing the LLM and implementing what it suggests.

reddit.com
u/darkeagle03 — 5 days ago