u/Competitive-Fun-7148

Built an AI writing tool that scores its own output and revises before you see it

Built an AI writing tool that scores its own output and revises before you see it

ContentAgent launched today on Product Hunt. It's an AI content creation tool built around one idea: the output should pass a quality gate before the user sees it.

How it works:

The user answers three conversational questions about their business. Voice calibration extracts sentence rhythm, vocabulary range, structures they use, things they'd never say. That profile becomes a generation constraint, not just a system prompt.

Then there's the gate. Eight deterministic detection categories run against every generated draft: AI vocabulary, sycophantic openers, generic conclusions, mechanical parallelism, metronome rhythm, hedge stacking, em dash density, format compliance. Score below 70 and the model revises automatically with specific feedback. The user only sees output that passed.

A specificity radar runs alongside it. Flags claims without evidence. "Significant improvements" catches a flag. "Reduced support tickets by 34%" doesn't.

14 templates across LinkedIn, blog, Twitter, email, Instagram, plus strategy templates. Platform constraints enforced per template (LinkedIn 3,000 chars, Twitter 280, email subject 60).

Stack: Next.js 16, Vercel AI SDK v6, Clerk, Polar.sh, Drizzle on Neon, OpenRouter. ~8,800 lines of TypeScript, 26 API routes. Built solo over 11 sprints.

Pricing: Free tier is 10 generations/month with all features. Pro is $19/month for 50 generations, model picker, and a separate LLM voice review pass.

Live at contentagent.kern.web.za.

Happy to answer any questions about the architecture or the quality gate implementation.

u/Competitive-Fun-7148 — 4 days ago
▲ 0 r/webdev

How I built a deterministic AI pattern detector for content quality (regex + LLM, 8 detection categories)

I've been working on a content tool for a few months and the most interesting part to build was the quality gate: a deterministic detector that runs inside the generation loop, fast enough to fit between streaming steps.

The problem: every AI writing tool I've used produces content with the same tells. Em dash density. Sentence length uniformity. Vocabulary leaning on a specific set of words (leverage, delve, navigate, tapestry). Structural tells like mechanical parallelism.

I wanted something deterministic that runs before the user sees the draft, with a low enough false-positive rate to actually be useful.

Architecture:

generate(brief) ->

streamText({ model, prompt, system, tools, stopWhen: stepCountIs(5) }) ->

onStep: runDeterministicGate(text) -> score

if score < threshold: tool_call('revise', { feedback })

onFinish: runLLMReview(text) -> nuanced_feedback

The deterministic gate is plain regex + heuristics across eight categories:

  1. AI vocabulary: wordlist match (leverage, delve, navigate, tapestry, etc.). Each match costs points.

  2. Sycophantic openers: phrases like "Great question!" or "What a fantastic insight" at sentence position 0.

  3. Generic conclusions: closer patterns like "the future belongs to those who..." or "In conclusion, ..."

  4. Mechanical parallelism: three-element lists with parallel structure ("X, Y, and Z" where all three are abstract nouns).

  5. Metronome rhythm: sentence length variance below a threshold. Human writing has high burstiness; AI doesn't.

  6. Hedge stacking: multiple qualifiers stacked in one phrase ("could potentially possibly").

  7. Em dash density: more than N em dashes per 100 words. This is the one that catches me every time.

  8. Format compliance: exceeds character limits, wrong hashtag count, wrong platform conventions.

Each category contributes to a 0-100 score. Below 70 triggers a tool call to revise with specific feedback: "4 instances of 'leverage', 3 consecutive sentences over 25 words, em dash density 2.3 per 100 words."

Why deterministic instead of LLM-only:

Speed: <50ms vs ~3s for an LLM pass. Cost: zero. Predictability: same input, same flags. Adding a new pattern is a regex, not a prompt tune.

Where deterministic falls short:

Can't catch nuanced voice mismatches. Can't grade argument strength. High false-positive rate on legitimate uses of flagged words. Static wordlists decay over time (Liang et al. tracked "delve" in arXiv abstracts: #1 AI word in 2024, frequency collapsed after public exposure).

That's why there's a second LLM review pass on the final output. Deterministic does the cheap fast filtering. LLM catches what regex can't.

Specificity radar: same architecture, different signal. Flags sentences that make claims without proof. Looks for digits, proper nouns, quoted phrases as positive signals. Sentences with none of those plus abstract claim verbs ("delivers", "improves", "transforms") get flagged.

False positives after 3 weeks of dogfooding:

Em dash check fires on legitimate use about 15% of the time. Will probably move to context-aware. Rule-of-three has the lowest false positive rate (<5%) because the parallel-structure heuristic is tight. AI vocabulary check is the most controversial. Sometimes "leverage" is genuinely the right word. Made it advisory: highlights but doesn't drop the score below threshold.

Stack: Next.js 16 App Router, Vercel AI SDK v6, Drizzle on Neon, Clerk auth.

reddit.com
u/Competitive-Fun-7148 — 4 days ago