How do you get an LLM to find specific patterns and not just generic categories?
Trying to figure this out and could use some pointers.
I'm feeding sales call transcripts into Gemini and asking it to pull out patterns that correlate with whether the rep booked a meeting. What I get back is stuff like "asks follow-up questions" or "uses social proof". Technically correct but useless because every rep does these to some degree.
What I actually want is patterns like "asks about urgency right after a price objection" or "names a competitor only after the lead mentions budget". Specific moves in specific spots. The LLM seems to default to category labels even when I ask for verbatim quotes and context.
Two things I think are going on:
The model groups things during extraction. Even when I tell it to keep the exact phrasing it still slaps a generic label on top, and when I aggregate across calls the specifics get lost behind the label.
I don't think my prompting is forcing the specificity hard enough. Saying "be specific" doesn't really work. I've tried giving examples of good vs bad outputs and it helps a little but not enough.
Things I'm thinking about trying:
Skip the LLM label entirely. Just keep the verbatim quote plus some context (what phase of the call, what came right before). Then embed all the quotes and cluster them, and let the clusters be the patterns instead of the LLM-assigned labels.
Two-pass extraction. First pass pulls candidate quotes. Second pass takes a batch of similar quotes and writes a tight description of what they have in common.
Use a stronger model just for the labeling step and see if the specificity changes.
Has anyone done something like this? Particularly interested if you've found a prompt pattern that reliably gets phrase-level output and not category-level. Also curious if there's a name for this problem in the literature, feels like it should have been studied but I haven't found the right keywords.