u/Any_Primary2646

Hi everyone,

I’m trying to run local LLMs using Ollana, currently:

Main use case is coding/code assistant workflows.

Current setup:

According to the docs this setup should work, but in practice I’m seeing weird issues:

I followed the Ollama documentation and configured everything manually because the Ollama server itself runs in Docker.

I know tools like Contonue seem to work better with local models, but I’d really like to make my existing tooling work if possible.

Has anyone run into similar issues?

Things I’m especially curious about:

do I need specific tool-use/function-calling model variants?
how stable is Ollama’s OpenAI compatibility layer in real-world coding workflows?
could this be a Docker networking / streaming issue?
how well does Claude CLI tolerate non-Claude models?
are there recommended flags/env vars for coding use cases?
could context window limits or VRAM exhaustion cause these “hangs”?

If anyone has a stable setup using Ollama + Claude CLI or Copilot Chat, I’d really appreciate example configs or recommendations.

I have similiar problem like this live issue: https://github.com/ollama/ollama/issues/13949

Thanks!