Building an FAQ/knowledge base from support tickets: clustering vs RAG vs human-reviewed drafts?
Hi everyone,
I have a large support-ticket archive and want to turn it into a maintainable FAQ / knowledge base.
RAG is already working: combined search over docs and a vectorized ticket database. Now I need to extract FAQ candidates from tickets in Qdrant.
I tried “double” clustering: large clusters first, then closest questions inside each cluster by cosine similarity, but it didn’t work well. I also tried HDBSCAN and BERTopic.
Has anyone solved a similar problem? How did you approach it?
u/Lanky-Ad5880 — 2 days ago