u/Any_Primary2646

Ollama + Claude CLI / Copilot Chat Issues with Local Models (Qwen, Llama, Gemma)

Ollama + Claude CLI / Copilot Chat Issues with Local Models (Qwen, Llama, Gemma)

Hi everyone,

I’m trying to run local LLMs using Ollana, currently:

  • Qwen 3.5
  • Meta Llama 3.1
  • Google Gemma 4

Main use case is coding/code assistant workflows.

Current setup:

  • Ollama runs inside Docker on a separate server
  • using Claude Code CLI (latest via npm)
  • and GH Copilot Chat for vscode
  • connected through OpenAI-compatible endpoints

According to the docs this setup should work, but in practice I’m seeing weird issues:

  • tooling/function calling does not work at all (for both claude cli and vscode)
  • with Claude CLI the generation occasionally just “hangs” mid-response
  • sometimes I get partial responses and then it appears to stall/timeout
  • other times tool calls are completely ignored

I followed the Ollama documentation and configured everything manually because the Ollama server itself runs in Docker.

I know tools like Contonue seem to work better with local models, but I’d really like to make my existing tooling work if possible.

Has anyone run into similar issues?

Things I’m especially curious about:

  • do I need specific tool-use/function-calling model variants?
  • how stable is Ollama’s OpenAI compatibility layer in real-world coding workflows?
  • could this be a Docker networking / streaming issue?
  • how well does Claude CLI tolerate non-Claude models?
  • are there recommended flags/env vars for coding use cases?
  • could context window limits or VRAM exhaustion cause these “hangs”?

If anyone has a stable setup using Ollama + Claude CLI or Copilot Chat, I’d really appreciate example configs or recommendations.

I have similiar problem like this live issue: https://github.com/ollama/ollama/issues/13949

Thanks!

u/Any_Primary2646 — 6 days ago