u/BeginningPush9896

https://preview.redd.it/795uu5sib23h1.png?width=1920&format=png&auto=webp&s=79afa013f029511b16ec0f62a53518c985b01bf2

Hi everyone!

I just finished a pet project I’d like to share. It’s a lightweight web app that acts as a unified frontend for local models (Ollama, LM Studio, llama.cpp, etc.) with a few features I personally missed:

🔌 HTTP tunnel via XTunnel – I use XTunnel to expose my local LLM to the internet. No port forwarding, no cloud proxy that stores your data. Works great for testing from mobile or sharing with a small team.

📁 RAG on uploaded files – drop PDFs, markdown, or txt files, the app chunks + embeds them locally (sentence-transformers, Chroma). Then you can ask questions about the content.

🧠 Custom system prompts – saved per session or as reusable templates. Perfect for roleplay, coding assistants, or structured output formats.

Why XTunnel?

I tried ngrok and localtunnel but wanted something I could self-host or run with minimal setup. XTunnel gave me a good balance of performance and simplicity. The app doesn’t force you into a specific tunnel — you can swap it — but I include ready-to-use examples for XTunnel.

Stack:

· Backend: FastAPI + SQLite

· Embeddings: sentence-transformers (all-MiniLM-L6-v2)

· Vector store: Chroma

· Tunnel: examples for XTunnel (others possible)

· Frontend: plain HTML + HTMX

Limitations:

· Only text files for RAG (no OCR)

· No streaming UI yet

· Tunnel security is your responsibility (basic auth recommended)

· Conversation memory with RAG

· Docker + Tailscale or Cloudflare Tunnel as an alternative