u/Lanky-Ad5880

▲ 3 r/LanguageTechnology+1 crossposts

Building an FAQ/knowledge base from support tickets: clustering vs RAG vs human-reviewed drafts?

Hi everyone,

I have a large support-ticket archive and want to turn it into a maintainable FAQ / knowledge base.

RAG is already working: combined search over docs and a vectorized ticket database. Now I need to extract FAQ candidates from tickets in Qdrant.

I tried “double” clustering: large clusters first, then closest questions inside each cluster by cosine similarity, but it didn’t work well. I also tried HDBSCAN and BERTopic.

Has anyone solved a similar problem? How did you approach it?

reddit.com
u/Lanky-Ad5880 — 2 days ago