Spring AI with local model through LM Studio
Couple of days ago I shared what I learned about Spring AI's chat memory. Today, here's what happened when I swapped the model behind it entirely.
Same Spring AI app. Same Java code. Same ChatClient, same @Tool annotations, same BeanOutputConverter for structured output. The only thing that changed: which model handled the requests.
OpenAI (GPT-4o) → Anthropic Claude Opus 4→ local Gemma 4 2B running through LM Studio.
The OpenAI → Claude switch was expected to work. Swap the starter dependency, update the config block, ship. Spring AI's provider abstraction is designed for this.
The local Gemma 4 2B switch was the interesting part. Same Anthropic starter dependency, just pointed at localhost:1234:
spring:
application:
name: spring-ai
ai:
anthropic:
api-key: ${LM_STUDIO_API_KEY}
base-url: http://127.0.0.1:1234
chat:
options:
model: google/gemma-4-e2b
memory:
repository:
jdbc:
initialize-schema: always
That's the entire config delta. LM Studio implements the Anthropic protocol, so Spring AI treats it as just another Anthropic-compatible endpoint. No separate "spring-ai-local" starter. No conditional Java code paths.
What I didn't expect — the 2B local model handled:
- Chat with memory (the same ChatMemoryAdvisor + JDBC repository setup from yesterday's post)
- Structured JSON output matching strict schemas
- Tool calling with proper parameter dispatch
- Code review (correctly identified a == vs .equals() bug in a real Java example)
Quality wasn't quite GPT-4o level, but it was meaningful enough that for what's probably 70% of business AI use cases — classification, summarization, structured extraction, simple agent loops — this would work in production. With zero per-request cost and full offline operation.
Recorded a walkthrough showing all three providers running the same demos (chat, memory, structured output, tool calling, code review) if you prefer video: https://youtu.be/lW0FMjDUzik
Repo with code: https://github.com/DmitryFinashkin/spring-ai
Has anyone here shipped multi-provider Spring AI in production yet? Curious how teams are handling provider routing — cost-based, latency-based, quality fallback, regional compliance — and what failure modes you're watching for.