u/Better-Platypus-3420

Hey everyone,

I wanted to share a project I've been working on called Glia. It is a 100% offline, local-first RAG and memory layer designed to connect your AI web chats (Claude, ChatGPT, DeepSeek) with your local developer tools (Claude Code, Cursor, Windsurf) using a unified local database.

I wanted something lightweight that did not require pulling heavy Docker containers or subscribing to third-party memory APIs. I settled on a Node.js + SQLite architecture running sqlite-vec (for 768-dim float32 embeddings) alongside SQLite FTS5 for hybrid search, powered completely by local Ollama instances.

We just launched a live website that outlines the details and demonstrates the features in action:

Website: https://glia-ai.vercel.app/
Codebase: https://github.com/Eshaan-Nair/Glia-AI

Technical Stack & Features:

Hybrid Search Retrieval: SQLite-vec (using nomic-embed-text locally) + FTS5 keyword prefix matching (porter stemmer).
Surgical Sentence-level Trimming: Chunks are sliced into sentences. When a prompt is intercepted, only the exact matching sentences are pulled out of the vector store instead of the whole paragraph. It cuts LLM prompt bloat by ~90-95% in my benchmarks.
Knowledge Graph Extraction: An offline task queue uses a local LLM (llama3.1:8b via Ollama) to extract entity triples (subject-relation-object). These are stored in a SQLite facts table (or Neo4j if you run the full Docker compose profile) and fused with the vector retrieval score.
HyDE (Hypothetical Document Embeddings): Queries are pre-processed to generate a hypothetical answer, which is embedded together with the original query to bridge semantic gaps.
Concurrency: Running SQLite in WAL (Write-Ahead Logging) mode allows the browser extension dashboard and active MCP sessions to read/write concurrently without locking.
PII Redaction: Aggressive scrubbing of JWTs, API keys, emails, and IPs in the extension before data is saved.

The extension works on Claude.ai, ChatGPT, DeepSeek, Gemini, Grok, and Mistral. The MCP server runs out of the same backend database for your terminal agent or Cursor.

You can set it up with a single command: npx glia-ai-setup

Glia is completely open-source (MIT). If you like the local-first approach or want to contribute to the SQLite vector pipeline, PRs are very welcome, and a star on GitHub helps the project get discovered!

I would appreciate any feedback on the SQLite hybrid search scaling, the scoring fusion algorithm (RAG pipeline details are in RAG_PIPELINE.md), or local graph extraction performance!

I built a system that uses Ollama's local embeddings to give ChatGPT, Claude, and Gemini persistent memory across chats.

Why local embeddings matter:

Instead of relying on OpenAI's embedding API, I use nomic-embed-text via Ollama. This means:

Zero API costs
No embedding data leaves your machine
Instant inference (runs on your GPU/CPU locally)
Privacy

The pipeline:

Chrome extension captures conversations
Backend chunks them (300-word windows, 80-word overlap)
Ollama generates embeddings locally (~768 dimensions)
Stores in ChromaDB (vector DB)
On new prompts: embed the prompt → semantic search → inject top-3 chunks

The result:

When you ask ChatGPT a question about your project, it automatically gets context from your entire conversation history. No re-explaining. No manual effort.

Tech:

Chrome extension (MV3)
Node.js backend
Ollama for embeddings
ChromaDB for vector storage
Neo4j for knowledge graphs (optional but powerful)

GitHub

Works offline. MIT licensed. Self-hosted.

Would love feedback from anyone using Ollama!

Glia – Local-first shared memory layer (SQLite-vec + FTS5 + Offline Knowledge Graph)