u/Glittering_Cup1104

▲ 9 r/Rag

Hey everyone,

I’m building a B2B AI customer support agent and trying to make RAG fast enough for a future voice agent.

Right now:

  • Everything is in AWS us-east-1
  • Vector search is under 100ms for p99
  • Using openai's text-embedding-3-small for embedding

The issue is embedding the query takes around 600ms to 1.1s every time. That’s basically my bottleneck now.

Tried the obvious stuff like keeping infra close, but no real improvement.

Couple questions:

  • Are there faster embedding models that don’t kill quality?
  • Does reducing dimensions actually help latency or not really?
  • Is it worth self hosting a smaller embedding model for this?
  • What kind of latency are people getting in real voice RAG setups?

Would really appreciate any practical tips here.

reddit.com
u/Glittering_Cup1104 — 19 days ago

I’ve been noticing that whenever I open Zed, it spins up like 5–6 Node.js processes. As soon as I quit Zed, all of them disappear, and when I open it again they come back. So it’s clearly tied to Zed.

At first I thought it was for AI features, but this happens even when I’m not using anything AI related.

So I’m wondering what these are actually for. Maybe language servers like TypeScript? Or internal stuff like file watching, plugins, etc?

Also a bit concerned because I’m on an M1 Mac ( 8GB RAM ) and don’t want random background processes eating resources for no reason.

Is this normal behavior or something off? Anyone know what Zed is doing under the hood?

u/Glittering_Cup1104 — 21 days ago