r/Agentic_Marketing

▲ 3 r/Agentic_Marketing+2 crossposts

The longer you run an AI agent, the more time you spend managing its memory instead of using it.

Month one is clean. By month six most people I know have a folder of saved prompts, a doc of context snippets, and a personal ritual for resetting state between sessions. That's not a workflow. That's a missing infrastructure layer you're doing by hand. And the deeper problem: even when memory persists, it accumulates without governance. Old signals stay alive. Outdated preferences keep winning retrieval. Nothing decays, nothing gets replaced, nothing loses authority over time. We're good at storing. We're terrible at forgetting safely.

How are you actually handling this beyond month three?

reddit.com
▲ 153 r/Agentic_Marketing+1 crossposts

My company just bought us corporate AI accounts. Expectation vs. Reality is hitting hard.

Management expects us to use this groundbreaking tech to automate complex data pipelines, optimize legacy code, and completely revolutionize our Q3 synergy.

In reality, I spent my morning using a multi-billion-dollar neural network to translate "per my last three emails, you illiterate walnut" into polite corporate-speak, followed by asking it for five professional variations of "I'm just putting the finishing touches on it" for a project I haven't even opened yet.

We aren't building a sci-fi future. We're just using the pinnacle of human engineering as an HR-approved shield to survive the 9-to-5.

reddit.com
u/ailovershoyab — 3 days ago
▲ 2 r/Agentic_Marketing+1 crossposts

the 5 things we ended up building into our voice ai platform after 2.5 years of trying to fake them. and the boring outcomes that made the work worth it

ok so 2.5 years into voice ai and 18 months building a voice ai platform and i'm gonna admit something. like half the features we ship now are things we spent way too long trying NOT to build. here's 5 of them and what happened when we finally caved.

  1. native integrations, not zapier.

for like 8 months we told everyone "just use zapier or n8n to connect your crm." agencies hated it. tbh i don't blame them.

ended up building native for the 8 things agencies actually used. highlevel, hubspot, twilio, slack, gmail, cal.com, google sheets, notion. plus webhooks for the long tail. tickets about "leads aren't syncing" dropped maybe 70%. nobody mentions integrations anymore which is kind of the goal.

  1. live call transfer to a real human.

we resisted this one for a long time. felt like admitting the ai failed. built these elaborate escalation queues that routed unclear calls to "human review later." clients ignored them. what they actually wanted was for the agent to qualify and then hand the phone to whoever could close, right then.

so we built real time transfer to any phone number, mid call, with context handed over. roofing client booked 3 jobs in one afternoon that would've just been voicemails. that's when we got it.

  1. rag knowledge base instead of monster system prompts.

early version had system prompts the length of a short novel. clients would change a price in one place and forget to update the prompt. agent would quote stale numbers on calls. like actually told a customer the wrong price for an emergency service call once.

shipped a rag layer that agencies update like a notion doc. agent reads from it on every call. "the agent said something incorrect" tickets basically went to zero.

  1. multilanguage without switching agents.

we had separate spanish and english agents for a while. agencies hated managing two. and callers in bilingual markets would start in english, switch to spanish mid sentence, agent would just give up.

built language detection at the audio layer so one agent handles a caller code switching mid call. a dental clinic in texas saw their bilingual no show rate drop noticeably. nobody actually requested this feature. they just stopped complaining about something they couldn't quite name.

  1. multi client dashboard with sub accounts.

ok this one is the most embarrassing in hindsight. for 6 months agencies were managing 10+ clients by logging in and out of separate accounts. like physically signing out, signing into the next one. brutal. we just hadn't built the dashboard yet.

shipped a single agency view with per client analytics, white labeled to the agency's brand. one agency went from 4 clients on the platform to 18 in 4ish months. nothing about the product got more powerful. per client overhead dropped from like 30 min/week to 5. you only find this stuff when you watch a real person use it for a week.

every single one of these we wished we'd built ~2 quarters earlier than we did. probably a few more on that list we still haven't gotten around to.

how are you implementing voice ai into your agency?

u/DeshMamba — 2 days ago
▲ 2 r/Agentic_Marketing+1 crossposts

agentmw — Lightweight middleware for reliable, context-efficient AI agents (open source)

Hi everyone,
I’ve open-sourced agentmw, a framework-agnostic middleware that sits between your LLM client and agent logic to make agents more reliable on long runs.
Key features:
• Real-time failure detection (loops, redundant calls, contradictions, hallucinations)
• Smart context compression (keeps recent tool results, drops stale stuff)
• Persistent reasoning library (SQLite + embeddings) that learns reusable patterns across sessions
• Time-travel debugging CLI
• Works with any provider (OpenAI, Anthropic, Ollama, etc.) and any agent framework
• Async, circuit breaker, MCP server support, TOML config

Demo: pip install -e '.[all]' && agentmw demo
It’s still early but already helping me keep agents from spiraling and wasting tokens. Would love honest feedback, bug reports, or ideas for additional middleware features the community would find useful.
Thanks!

reddit.com
u/Just_Vugg_PolyMCP — 2 days ago
▲ 15 r/Agentic_Marketing+2 crossposts

what model are you using for your personal AI agent?

Hey everyone, I’m building a small AI agent for personal use and I’m trying to figure out which model actually feels best in day to day usage. I’ve been testing ChatGPT, Claude, Gemini and a few open-source ones, but I keep changing my mind 😅
Curious what people here are using for their own agents and what’s been working well for you. Mostly looking for something good at reasoning, tool calling and general reliability without getting too expensive. Would love to hear real experiences instead of just benchmark comparisons.

reddit.com
u/Only-Chocolate9600 — 3 days ago
▲ 32 r/Agentic_Marketing+13 crossposts

How are you actually measuring the ROI from social media in 2026? Let's talk about the real numbers and not some vanity metrics

I have been handling the social media for 3 brands for almost 2 years now and to be honest, proving results still feels confusing sometimes. There are months where posts get good reach and lots of interaction but it doesn’t seem to turn into anything meaningful then there are random some posts that don’t perform well publicly but somehow bring inquiries or customers later That’s why I’m curious how other people are doing it. Are you tracking sales, leads, website visits, bookings, conversions or something completely different? Do you use tools and dashboards or are you keeping it simple with spreadsheets and basic reports?

I am also wondering if measuring ROI changes depending on what you do. I’d imagine agencies, freelancers, local businesses, SaaS companies and creators probably all look at different metrics What’s one thing that made you realize your social strategy was actually working? And what’s one mistake you made while tracking performance that changed the way you report results now?

Would genuinely like to hear real experiences because I feel like many of us are still trying to figure this out. Share your process, opinions, or even things that didn’t work it might help someone else too

reddit.com
u/Dexter_274 — 4 days ago
▲ 9 r/Agentic_Marketing+1 crossposts

I let Codex and Claude Opus work on the same Java AI agent monolith

I ran a small experiment on my Java pet project and the result was less clean than I expected.

Small disclaimer: I did the final comparison review on April 19, 2026. With AI coding tools, that already makes the result somewhat time-sensitive.

The project is a multi-module Java monolith with a Telegram bot, an agent loop, tools, memory, streaming responses, and a mix of local models and OpenRouter models. At that point I had already started moving part of the agent logic away from Spring AI into my own FSM/ReAct flow, but the code still had many bugs.

So I copied the whole project into two separate branches, gave Codex 5.3 and Claude Opus 4.6 the same vague prompt, and let both agents work almost autonomously.

The rules were intentionally simple:

  • do the task however you think is right
  • pass the existing tests, including e2e
  • run review
  • fix review comments
  • repeat until only minor comments remain

Basically, pure vibe coding.

Claude Opus produced the more attractive architecture in several places. The best part was around streaming output. It created a clearer boundary between raw model chunks and text that could be shown to a Telegram user. That matters because models do not stream neat sentences. They can send <th, then ink>, then internal reasoning, then a closing tag. If you clean the final text only after streaming is done, part of that garbage may already have reached the user.

In that sense, Claude's idea was better: filter before emitting user-visible events.

Codex was less elegant. More logic was tied to context mutation and post-processing. It felt like code that could become harder to maintain later.

But then I asked for a sequence diagram / call chain and found the uncomfortable part: some of Claude's nice architecture was not actually used. The tests were green because the old Spring AI streaming path was still covering the e2e scenario, not because the new ReAct/FSM streaming flow was properly integrated.

That changed how I read the whole result.

Codex had its own problems. It introduced more state and more concurrency risk. One branch even failed a REST test slice on the full verify run. But Codex also added practical things that mattered:

  • timeout and fallback for a stuck AI stream
  • conversation history recovery after restart
  • URL hygiene before showing links to the user
  • better separation of progress and final answer in the streaming contract
  • batching for Telegram progress updates

Not all of it was beautiful. Some of it was exactly the kind of code you later want to simplify. But more of it was connected to the working product.

That was the main lesson for me: with AI coding agents, "good architecture" and "executed code path" are not the same thing.

The second experiment was similar. I compared Codex 5.3 with a newer GPT model on the same area. Again, the stronger model proposed a neater abstraction, but the code mostly did not execute and it did not find the real bugs. Codex was more boring, more direct, and more useful for this specific autonomous development loop.

I am not claiming Codex is universally better than Claude. This was one project, one setup, one date, one style of prompting, and one fairly specific task: autonomous development on a Java Telegram agent with minimal supervision.

For planning, research, and abstract design, stronger models can be better. Anthropic's own Claude Code setup also points in that direction: Opus is used for planning/advice, while execution often goes through a different model.

But for my setup, the practical result was simple:

the model that looked less impressive often moved the real product further.

The part I am still thinking about is not "which model is best." It is how to evaluate coding agents when they can produce convincing architecture that never actually enters the runtime path.

For people building or using AI coding agents: how do you check that the agent's best-looking work is really connected to the product, not just passing tests through an old path?

reddit.com
u/Intelligent_Path_878 — 3 days ago

Looking for developers

In the process of starting an agency in Singapore, looking for developers that can handle our backend for the foreseeable future so my partner and I can focus soley on finding clients. For our current client we are building a multilingual AI receptionist for a dental clinic and we plan to try and stick with the medical niche but will not turn down any other businesses if they do happen to be interested. Keeping the service catalogue as wide as possible right now as we are actively talking to many business owners trying to figure out their pain points and what they need. No better way to find out than to just ask right? If any developers are interested do DM me and we can hop on a call and have a discussion.

reddit.com
u/Turbulent-Mouse9892 — 3 days ago
▲ 5 r/Agentic_Marketing+1 crossposts

Switching your LLM is easy. Switching your memory layer after six months in production is a different problem entirely.

By then you have thousands of stored claims, drift you can't trace, and no clean migration path. The initial memory choice compounds in a way the initial model choice doesn't. Most teams don't realize this until it's too. so does anyone actually evaluate memory tools on exit cost before adopting them? or is everyone still picking on month-one ease and discovering the lock-in later?

reddit.com
u/Distinct-Shoulder592 — 4 days ago
▲ 10 r/Agentic_Marketing+1 crossposts

Getting first 100 users on your SaaS

So I have been building my product for past couple of months now, the idea is to build a "Search Engine for AI Agents", currently most of the AI Agents still use the search layer that was built for them which is wrong in many ways, the results are SEO corrupted, they are given whole pages instead of targeted section of those, stale responses are some issues to start with. To solve this problem I am building NineLayer, now there are some existing products in the market also like Tavily and Exa but they are also expensive and their responses have hallucinations too.

I have been trying to get my early users for NineLayer and I have been failing to do so, I have tried posting on X, in different communities, I have started posting on Reddit also, some traction here but still not enough. Tried LinkedIn but the people there are not that much into trying new products or be seen as an early user.

It'll help me a lot if you guys can share some tips and tricks for getting users.

I'll be attaching platform link so that you guys can have better understanding.

Thanks!

reddit.com
u/Divyansh3021 — 4 days ago
▲ 5 r/Agentic_Marketing+2 crossposts

Getting decent traffic to my landing page but absolutely zero waitlist signups. What am I missing?

hey guys, looking for some honest advice on how to get my project off the ground because right now, i'm flatlining.

i’ve been driving some initial traffic to my site (stats in the screenshot), but conversion is dead at zero. literally nobody is signing up for the waitlist.

for context, i built an ai email triage tool for gmail/outlook because i was drowning in 100+ emails a day. it basically uses an llm to read incoming mail, score it 0–100 based on actual urgency, and hide the noise so you only see the 5% that genuinely requires a human reply.

i knew people would be paranoid about security, so it's built on strictly read-only oauth. it physically cannot send, reply, delete, or modify anything. your inbox stays yours, it just filters the noise.

but looking at these zero signups, i'm starting to think that the second a professional sees "ai" and "inbox" in the same sentence, they just close the tab.

if you’ve launched a tool in a high-privacy or sensitive space, how did you get over this initial hump? is my positioning just failing to explain the value, or is an ai email app an immediate dealbreaker for people?

brutal feedback appreciated.

u/Prior_Employee_7247 — 3 days ago
▲ 13 r/Agentic_Marketing+2 crossposts

Nobody tells you that switching memory tools at month six is nothing like switching models.

Switching models: change a config line. Done.

Switching memory layers after six months of production:

  • Thousands of stored claims built up over hundreds of sessions
  • Contradiction logs that shaped current behavior
  • Trust scores that determine what wins retrieval today
  • Derived summaries that reference facts that no longer exist
  • User adaptations built around what the agent currently believes

That's not portable. That's institutional memory baked into someone else's infrastructure that you can't inspect, can't export cleanly, and can't migrate without rebuilding behavior from scratch.

The exit cost of a memory tool compounds every week you use it. Most teams pick on month-one ease and discover this at month six when switching is already expensive.

Has anyone actually migrated a memory layer after real accumulation? What did that look like?

reddit.com
u/Distinct-Shoulder592 — 5 days ago
▲ 16 r/Agentic_Marketing+8 crossposts

Trying to Automate Social Posting for an Event with Claude Code (What Actually Worked)

I was trying to connect all my social channels to drive registrations for the AI x Marketing Summit (May 28–29 in SF) using Claude Code. Thought I’d share how it went in case anyone else is going down this rabbit hole.

Here’s what I ran into:

  • Luma – surprisingly the easiest to set up
  • Twitter/X – ran into a bunch of credential issues
  • LinkedIn – couldn’t get a direct connection working
  • Reddit – had to create a Reddit app first
  • Apollo – easy to connect, but not very useful after that
  • Instagram – absolute pain via MCP, especially if you’re not active on Facebook
  • TikTok – pretty straightforward
  • YouTube – also easy through Composio - didn't even try going direct
  • Substack – haven’t set this up yet
  • Humanic – using this for email

Eventually landed on Bright Data + Composio. That combo worked well for Twitter and Reddit (super smooth), and somewhat for LinkedIn.

Big takeaway: a lot of MCP servers are still pretty limited and harder to set up than expected.

Curious if anyone else has found a cleaner stack for managing cross-platform posting?

reddit.com
u/Bitter-Wonder-7971 — 6 days ago
▲ 1 r/Agentic_Marketing+2 crossposts

GPT-5.5 is OpenAI’s best model. But your benchmark might be measuring the judge too.

posted part 1 last week about model costs/scores and a bunch of people pushed back (fairly) that one benchmark shouldn’t be treated like a universal truth. totally agree with that btw.

but while going through the follow-up analysis from Tessl, i think the more interesting finding actually ended up being the judges themselves:
https://tessl.io/blog/your-benchmarks-are-lying-to-you-and-your-judge-is-to-blame/

same 6 models. ame 11 engineering skills. ame outputs

only the judge changed. and the rankings moved around way more than i expected.

opus-4-7 stays #1 regardless of judge:
94.5 under Sonnet
89.2 under GPT-5.5
96.5 under Opus

so the top-end signal seems pretty real. but below that, things get messy fast.

gpt-5.3 goes from rank #3 under Sonnet to rank #5 under GPT-5.5. 91.9 vs 75.7 on the exact same outputs. that’s a 16 point swing caused purely by swapping the evaluator. one individual skill apparently shifted by 47 points depending on which judge graded it.

that’s the part that stood out to me most because it explains a lot of the reactions on the last post too.

some people in the comments were saying:
“there’s no way composer-2 is that good”
others were saying:
“opus 4.7 is miles ahead in practice”
others focused entirely on cost/performance

and honestly all of them can kinda be “right” depending on what the evaluator rewards.

Judge Avg without-skill Avg with-skill Avg lift
Sonnet 76.1 90.3 +14.2
Opus-4-7 72.6 88.3 +15.7
GPT-5.5 70.7 83.4 +12.7

Sonnet was consistently the most lenient. GPT-5.5 was the strictest.
Almost a 7 point average gap between judges grading the same work.

the self-judge stuff is interesting too:
- Opus grading itself gets a +4.6 boost vs cross-judge average
- GPT-5.5 grading itself actually scores lower than the other judges gave it

so yeah, maybe i’m biased because i work at Tessl, but i think the takeaway here is less: “this model wins”

and more: “single-judge evals are probably noisier than most people think”

especially once the model gaps get small.

BUT THE MOST INTERESTING PART!!

The average leaderboard still looks very very very close to the previous benchmark check the screenshot attached.

u/rohansrma1 — 6 days ago
▲ 8 r/Agentic_Marketing+6 crossposts

Google updated its spam policy yesterday. Every SEO newsletter in your inbox covered it.

Here's what none of them told you.

The update covers Google Search. AI Overviews. AI Mode. One ecosystem, one policy, one surface.

ChatGPT. Perplexity. Copilot. Gemini standalone. Claude. No equivalent policy exists on any of them. No enforcement mechanism. No guidance. No rules.

Which means the brands celebrating yesterday's update have solved roughly 20% of the problem and declared victory.

But the policy gap is not even the real issue. The real issue is what we see in Conversational Survival Rate data across platforms.

Remediation is platform-specific.

The evidence architecture that lifts your brand to a T4 purchase recommendation on ChatGPT doesn't transfer to Perplexity.

What moves Gemini standalone doesn't move Copilot.

Each platform has different retrieval logic, different training provenance, different evidence hierarchies.

A brand that fixes its Google AI performance can simultaneously be losing the final purchase recommendation on every other platform - and have no way of knowing it.

We have tested this across categories. The CSR differentials across platforms for the same brand, with the same content, are not marginal. They're large.

The platform that recommends your brand most often is frequently not the platform your customers are actually using to make the decision.

Google's guidance document published alongside the policy update says foundational SEO solves the AI problem. It doesn't.

That advice is true for Google Search. It is incomplete everywhere else.

And "everywhere else" is where a growing share of purchase decisions are being made.

Brands that treat yesterday's update as closure are making a measurement error. They're assuming the room Google cleaned is the room that matters.

AIVO Meridian measures all five rooms. CSR tells you exactly where your brand is surviving - and where it isn't.

Are you an SEO, an AEO or a GEO? Which one (or combination) really works in AI search, across all platform?

reddit.com
u/Working_Advertising5 — 6 days ago
▲ 2 r/Agentic_Marketing+1 crossposts

AI memory products aren't selling memory. They're selling lock-in and calling it persistence.

You can't inspect what's stored. You can't correct it directly. You can't swap the backend without rewriting your stack. You can't trace where a belief came from. That's not a memory layer. That's a black box with a nice API. The memory layer you don't outgrow is the one you actually own. Inspectable, correctable, portable, self-hosted. The industry is at the same inflection point databases were before standardised infrastructure existed.Context you can inspect, correct, swap, and run yourself is a different product category than what most tools are shipping. Who's building for that?

reddit.com
u/Distinct-Shoulder592 — 6 days ago

2.5 years building voice AI and ~1k calls a day later, here's what i'd tell past me

so this is gonna be more of a brain dump than a structured post.

i've been building voice AI agents for about two and a half years. what we ship is running a little over 1,000 calls a day right now. mostly inbound receptionist and qualification, some outbound follow-ups.

i see a lot of "is voice AI ready yet" and "how do i build this" posts in here so figured i'd dump what i actually learned. not what the docs say. the stuff that only shows up after you've shipped a few hundred thousand calls.

  1. latency is the entire game. the model can be smarter, the prompt can be better, none of it matters if there's a 1.2 second pause before the agent responds. callers will either hang up or talk over it. anything under ~700ms feels human. anything over a second feels like a robot reading a script. probably 60% of our engineering time goes here, not into the LLM layer.
  2. interruption handling matters more than script quality. a "smart" agent that can't be cut off feels worse than a basic agent that yields the second you start talking. barge-in detection is the most underrated part of the stack. nobody talks about it because it's boring.
  3. voice selection is doing more work than your prompt. same exact prompt, different TTS voice, completely different outcomes. we've tested this dozens of times. the voice is probably 60% of perceived intelligence. people will rate a dumb agent with a warm voice higher than a smart agent with a clinical one.
  4. hallucinations on phone calls hit different than in chat. on chat you can scroll back and correct it, the user has time to notice. on a call, the agent confidently quotes a wrong price or invents an appointment slot and the call is over. trust is gone. guardrails on pricing, availability, and policy are the most important code we write and they're the least glamorous.
  5. the call almost never fails. the handoff does. AI handles the conversation fine. then it transfers to a human and the human gets half the data, or it writes to the CRM and the fields don't map, or it sends the calendar invite to the wrong timezone. the voice agent is maybe 30% of the actual product. the rest is integration plumbing that nobody puts in their demo video.
  6. people are way more chill with AI than i expected, but only if you tell them. agents that open with "hi, i'm an AI assistant for [business], how can i help" outperform agents that try to pass as human. tbh i thought it'd be the opposite when we started. the "trick them" play feels clever for a week and then you start losing calls because someone caught on.
  7. volume reveals everything demos hide. the first 100 calls feel like magic. at 1,000 a day you find out about people calling from inside a moving truck, kids screaming in the background, three way calls, an entire call in Spanglish, an old phone with a 300ms transmission delay. you cannot prompt your way out of these. you have to engineer for the chaos.

happy to get into any of these if anyone's curious.

reddit.com
u/DeshMamba — 7 days ago
▲ 8 r/Agentic_Marketing+3 crossposts

Testing agentic posting via Claude Code + Composio MCP

This is a test post created by Claude Code using the Composio MCP server — part of building out the AI x Marketing agentic stack.

If you're curious how this works: Claude Code connected to Reddit via Composio's MCP integration and posted this autonomously from a single prompt.

More to come. 🚀

reddit.com
u/Bitter-Wonder-7971 — 7 days ago
▲ 1 r/Agentic_Marketing+1 crossposts

I built a WhatsApp lead intake bot because manual client chats were getting messy

I run a small automation/web systems agency and one problem I kept noticing is how quickly WhatsApp conversations become messy.

A client says “hi”, asks about a website, then later asks about CRM, then sends requirements in random messages — and if you’re not careful, the lead gets lost.

So I built a small WhatsApp intake system for my own agency.

Current flow:

User sends hi

→ gets a service menu

They choose:

  1. AI Automation

  2. Website Development

  3. CRM / Dashboard Systems

  4. WhatsApp Bot Automation

  5. Cybersecurity / Audit Reports

Then the bot asks for details based on the selected service.

Example:

If they choose CRM, it asks what kind of dashboard they want, what it should track, team size, and whether they need WhatsApp/email automation.

Once they send details:

→ lead gets saved into MongoDB

→ admin gets a WhatsApp notification

→ client gets a confirmation message

→ session clears automatically

I also added menu/back/cancel commands so users don’t get stuck in a flow.

It’s not meant to replace human conversation. The goal is just to collect clean requirements before I follow up manually.

Curious — for people running service businesses, would you prefer this kind of structured WhatsApp intake, or do you think it feels too automated?

reddit.com
u/Additional_Lobster12 — 7 days ago

What’s one AI marketing workflow that actually saves you time every week?

I have been exploring Agentic Marketing setups lately, and it’s interesting to see how people are using AI agents for real marketing work instead of just experiments. Some are automating research, content creation, outreach, SEO tasks, or reporting, while others use simple workflows that save a few hours every week.

I am curious what’s actually working for people in real-world marketing. What’s one agentic marketing workflow you genuinely use regularly that saves time or improves results?

reddit.com
u/UsualSquash1186 — 7 days ago