
I built a web app that connects to local LLMs via XTunnel + RAG from uploaded files
Hi everyone!
I just finished a pet project I’d like to share. It’s a lightweight web app that acts as a unified frontend for local models (Ollama, LM Studio, llama.cpp, etc.) with a few features I personally missed:
🔌 HTTP tunnel via XTunnel – I use XTunnel to expose my local LLM to the internet. No port forwarding, no cloud proxy that stores your data. Works great for testing from mobile or sharing with a small team.
📁 RAG on uploaded files – drop PDFs, markdown, or txt files, the app chunks + embeds them locally (sentence-transformers, Chroma). Then you can ask questions about the content.
🧠 Custom system prompts – saved per session or as reusable templates. Perfect for roleplay, coding assistants, or structured output formats.
Why XTunnel?
I tried ngrok and localtunnel but wanted something I could self-host or run with minimal setup. XTunnel gave me a good balance of performance and simplicity. The app doesn’t force you into a specific tunnel — you can swap it — but I include ready-to-use examples for XTunnel.
Stack:
· Backend: FastAPI + SQLite
· Embeddings: sentence-transformers (all-MiniLM-L6-v2)
· Vector store: Chroma
· Tunnel: examples for XTunnel (others possible)
· Frontend: plain HTML + HTMX
Limitations:
· Only text files for RAG (no OCR)
· No streaming UI yet
· Tunnel security is your responsibility (basic auth recommended)
Next:
· Conversation memory with RAG
· Docker + Tailscale or Cloudflare Tunnel as an alternative
· Maybe OpenAI‑compatible endpoint
P.S: its full vibecode project, i build it for 5 hour maybe.