Does anyone find any useful Hermes-Agent or Openclaw?

I'm curious, does anyone really find it useful, anyone using it for their daily workflow?
Is there anyone using either Manus AI, Perplexity Computer, Claude cowork or open source agents (Hermes, Openclaw)?

reddit.com
u/Kakachia777 — 11 days ago

I built multi-agent automations for over 1500 businesses over 3 years. Every one custom. No n8n, no Zapier, no Hermes Agent, no OpenClaw. Here is why each fails on real business load and what I build instead

Why frameworks break:

n8n/Zapier — they are workflow runners, not agent runtimes. They work fine for simple trigger → action automations, but they of course break when the workflow needs durable state, retries, backpressure, long-running context, custom rate-limit handling, and memory across executions. Once you pass a few conditional branches, the system turns into visual control-flow spaghetti: hard to diff, hard to test, hard to version, and hard to debug. n8n itself recommends queue mode, workers, concurrency limits, and execution-data pruning when running at scale, which tells you the real production problem is orchestration/state, not drawing nodes on a canvas. Zapier has step limits, message/activity limits, and knowledge-source sync limits, so it is great as an integration layer but bad as the core brain of an agent system.

Hermes Agent — the idea is good: persistent memory, self-generated skills, and a learning loop. The issue is control. In production, a system that modifies its own operating procedures needs versioning, evals, rollback, approval gates, and observability. Otherwise the agent “learns” from one successful run, writes a skill that overfits the task, and silently changes future behavior. That is dangerous for business workflows. Hermes is also still an agent runtime, not a data platform: it does not solve canonical entity storage, source provenance, deduplication, multi-tenant memory, confidence scoring, or auditability. Its own pitch is that it creates skills from experience and searches past conversations, which is useful for repeated personal workflows, but not enough for a production intelligence backend

OpenClaw — the problem is the trust boundary. OpenClaw is fine because it connects agents to channels, simple tools, skills, browser, and messaging apps... That same breadth becomes the failure mode. Its own security docs say the gateway assumes one trusted operator boundary and is not recommended as a hostile multi-tenant boundary. For business use, that means you cannot casually put multiple customers, credentials, memories, tools, and agents behind one shared runtime. You need per-tenant isolation, scoped credentials, approval policies, audit logs, sandboxing, and a separate source-of-truth database. OpenClaw is useful as a channel/orchestration shell, but risky as the core platform

The deeper issue: all of these frameworks solve the visible 10% of automation — prompts, tools, nodes, chat, actions. The hard 90% is state management: retries, idempotency, memory governance, rate limits, task logs, permissions, schema validation, entity resolution, human handoff, and recovery after partial failure. That is why real business automations eventually move away from “one framework does everything” and toward a backend-first architecture: queues, workers, databases, vector memory, structured logs, validation gates, and small scoped agents on top.

Personal automations I run:

Fitness + health tracking — pulls wearable data, bodyweight logs, meals, sleep, training volume, and weekly trend changes into a structured table. A small planning agent adjusts calories/macros based on rolling averages instead of daily noise. Another agent generates grocery lists and meal options from constraints like protein target, schedule, and food preferences.

Spaced repetition learning — ingests articles, PDFs, YouTube transcripts, docs, and saved notes. The pipeline extracts claims, definitions, examples, and “things worth remembering,” then generates review cards with source links. It uses recency decay and difficulty scoring instead of dumping everything into a static Anki-style deck.

Life organizing — parses emails, receipts, appointment confirmations, bills, subscriptions, and calendar invites. It extracts due dates, amounts, vendor names, cancellation windows, and required actions into a task table. Anything high-risk gets a human approval step before the system sends, pays, cancels, or confirms anything.

Research aggregation — monitors Reddit, HN, RSS feeds, niche blogs, GitHub repos, docs, and YouTube channels. It deduplicates posts by URL/content hash, maps entities, scores relevance using topic embeddings + recency decay, and produces a morning digest with “why this matters,” not just links.

Business automations I’ve built:

Customer support triage — inbound tickets/emails classified by intent, urgency, product area, sentiment, customer tier, and required action. Low-risk replies are drafted automatically, not blindly sent. High-risk cases escalate with summarized context, account history, related docs, and suggested next steps. The key is not the chatbot — it is the routing, confidence thresholds, and audit trail.

Lead research + qualification — browser/API agents collect signals from LinkedIn, G2, Reddit, company sites, job posts, review platforms, GitHub, and news. The system normalizes companies into one entity record, enriches with firmographics, scores fit, detects trigger events, and generates personalized outreach based on actual evidence. No “spray and pray” scraping — every lead needs a reason.

Content engine — competitor pages, social posts, search trends, YouTube transcripts, comments, G2 reviews, and customer language are ingested into a research database. One agent extracts angles, another maps them to brand voice, another drafts, another checks claims, another formats for platform constraints. The output is not just content; it is content backed by source material.

Financial reporting — Stripe, Shopify, QuickBooks/Xero, Meta Ads, Google Ads, and bank exports normalized into one reporting schema. The automation handles currency, refunds, attribution windows, missing data, and reconciliation flags. Final outputs go into Excel/Sheets dashboards with charts, variance notes, and anomaly detection.

Document processing — invoices, contracts, compliance docs, onboarding forms, PDFs, screenshots, and scanned files parsed through multimodal extraction. Output goes through schema validation: vendor, amount, due date, clauses, renewal terms, missing fields, risk flags. Anything uncertain goes to a review queue instead of pretending LLMs are perfect.

Video/audio workflows — podcasts, meetings, calls, webinars, and long-form videos transcribed, segmented, summarized, and converted into clips, captions, highlight reels, newsletters, social posts, and searchable knowledge entries. The system tracks speaker turns, topics, quotes, timestamps, and reusable snippets.

GitHub/dev workflows — PR review agents, issue triage, dependency monitoring, changelog generation, release note drafting, test failure summarization, and codebase Q&A. The important part is repository context: conventions, file ownership, recent commits, linked issues, CI logs, and deployment history. Without that, “AI code review” is mostly noise.

The architecture pattern that keeps working:

For most production systems, I use some variation of this:

source connector → raw artifact store → parser → normalizer → entity resolver → vectorizer → scorer → task queue → narrow agent → validator → human gate if needed → final action

Every step writes state.

Every external call has retry/backoff.

Every generated output has a schema.

Every risky action has an approval gate.

Every workflow has a dead-letter path.

That sounds boring, but boring is what makes automation survive Monday morning.

Stack:

Python Go TS Direct libraries. No heavy agent abstraction framework.

Typical stack:

Docker, litellm, playwright, httpx, aiohttp, PyGithub, pandas, instagrapi, crontab, instagra, pi polars, openpyxl, feedparser, google-api-python-client, crawl4ai, agent-browser, browser-use, playwright-cli, lxml, pydantic, sqlalchemy, sqlite, lancedb, kuzu, postgres, redis, celery/rq, ffmpeg, elevenlabs and others...

For small clients, a single VPS is often enough.

For bigger workflows, I split it into workers:

ingestion workers

browser workers

embedding workers

LLM workers

reporting workers

notification workers

The mistake people make is starting with “which agent framework?”

The better question is: where does state live, how do tasks recover, and how do we know the output is correct, do we have verifier, what metrics we set, etc...

The numbers:

Personal systems save me around 3.5 hours/day across research, admin, health planning, and learning.

Business systems usually replace or compress $4K–$6K/month of repetitive labor per client when scoped correctly.

Small systems often run on a $40–$100/month VPS plus model/API costs.

The expensive part is not hosting. The expensive part is bad architecture: duplicate work, broken retries, messy state, and humans cleaning up after “autonomous” agents

Curious how others here are handling state, retries, memory, and human approval gates in production agent systems. Happy to compare architectures in the comments.

reddit.com
u/Kakachia777 — 1 month ago

I built multi-agent automations for over 1500 businesses over 3 years. Every one custom. No n8n, no Zapier, no Hermes Agent, no OpenClaw. Here is why each fails on real business load and what I build instead

Why frameworks break:

n8n/Zapier — they are workflow runners, not agent runtimes. They work fine for simple trigger → action automations, but they of course break when the workflow needs durable state, retries, backpressure, long-running context, custom rate-limit handling, and memory across executions. Once you pass a few conditional branches, the system turns into visual control-flow spaghetti: hard to diff, hard to test, hard to version, and hard to debug. n8n itself recommends queue mode, workers, concurrency limits, and execution-data pruning when running at scale, which tells you the real production problem is orchestration/state, not drawing nodes on a canvas. Zapier has step limits, message/activity limits, and knowledge-source sync limits, so it is great as an integration layer but bad as the core brain of an agent system.

Hermes Agent — the idea is good: persistent memory, self-generated skills, and a learning loop. The issue is control. In production, a system that modifies its own operating procedures needs versioning, evals, rollback, approval gates, and observability. Otherwise the agent “learns” from one successful run, writes a skill that overfits the task, and silently changes future behavior. That is dangerous for business workflows. Hermes is also still an agent runtime, not a data platform: it does not solve canonical entity storage, source provenance, deduplication, multi-tenant memory, confidence scoring, or auditability. Its own pitch is that it creates skills from experience and searches past conversations, which is useful for repeated personal workflows, but not enough for a production intelligence backend

OpenClaw — the problem is the trust boundary. OpenClaw is fine because it connects agents to channels, simple tools, skills, browser, and messaging apps... That same breadth becomes the failure mode. Its own security docs say the gateway assumes one trusted operator boundary and is not recommended as a hostile multi-tenant boundary. For business use, that means you cannot casually put multiple customers, credentials, memories, tools, and agents behind one shared runtime. You need per-tenant isolation, scoped credentials, approval policies, audit logs, sandboxing, and a separate source-of-truth database. OpenClaw is useful as a channel/orchestration shell, but risky as the core platform

The deeper issue: all of these frameworks solve the visible 10% of automation — prompts, tools, nodes, chat, actions. The hard 90% is state management: retries, idempotency, memory governance, rate limits, task logs, permissions, schema validation, entity resolution, human handoff, and recovery after partial failure. That is why real business automations eventually move away from “one framework does everything” and toward a backend-first architecture: queues, workers, databases, vector memory, structured logs, validation gates, and small scoped agents on top.

Personal automations I run:

Fitness + health tracking — pulls wearable data, bodyweight logs, meals, sleep, training volume, and weekly trend changes into a structured table. A small planning agent adjusts calories/macros based on rolling averages instead of daily noise. Another agent generates grocery lists and meal options from constraints like protein target, schedule, and food preferences.

Spaced repetition learning — ingests articles, PDFs, YouTube transcripts, docs, and saved notes. The pipeline extracts claims, definitions, examples, and “things worth remembering,” then generates review cards with source links. It uses recency decay and difficulty scoring instead of dumping everything into a static Anki-style deck.

Life organizing — parses emails, receipts, appointment confirmations, bills, subscriptions, and calendar invites. It extracts due dates, amounts, vendor names, cancellation windows, and required actions into a task table. Anything high-risk gets a human approval step before the system sends, pays, cancels, or confirms anything.

Research aggregation — monitors Reddit, HN, RSS feeds, niche blogs, GitHub repos, docs, and YouTube channels. It deduplicates posts by URL/content hash, maps entities, scores relevance using topic embeddings + recency decay, and produces a morning digest with “why this matters,” not just links.

Business automations I’ve built:

Customer support triage — inbound tickets/emails classified by intent, urgency, product area, sentiment, customer tier, and required action. Low-risk replies are drafted automatically, not blindly sent. High-risk cases escalate with summarized context, account history, related docs, and suggested next steps. The key is not the chatbot — it is the routing, confidence thresholds, and audit trail.

Lead research + qualification — browser/API agents collect signals from LinkedIn, G2, Reddit, company sites, job posts, review platforms, GitHub, and news. The system normalizes companies into one entity record, enriches with firmographics, scores fit, detects trigger events, and generates personalized outreach based on actual evidence. No “spray and pray” scraping — every lead needs a reason.

Content engine — competitor pages, social posts, search trends, YouTube transcripts, comments, G2 reviews, and customer language are ingested into a research database. One agent extracts angles, another maps them to brand voice, another drafts, another checks claims, another formats for platform constraints. The output is not just content; it is content backed by source material.

Financial reporting — Stripe, Shopify, QuickBooks/Xero, Meta Ads, Google Ads, and bank exports normalized into one reporting schema. The automation handles currency, refunds, attribution windows, missing data, and reconciliation flags. Final outputs go into Excel/Sheets dashboards with charts, variance notes, and anomaly detection.

Document processing — invoices, contracts, compliance docs, onboarding forms, PDFs, screenshots, and scanned files parsed through multimodal extraction. Output goes through schema validation: vendor, amount, due date, clauses, renewal terms, missing fields, risk flags. Anything uncertain goes to a review queue instead of pretending LLMs are perfect.

Video/audio workflows — podcasts, meetings, calls, webinars, and long-form videos transcribed, segmented, summarized, and converted into clips, captions, highlight reels, newsletters, social posts, and searchable knowledge entries. The system tracks speaker turns, topics, quotes, timestamps, and reusable snippets.

GitHub/dev workflows — PR review agents, issue triage, dependency monitoring, changelog generation, release note drafting, test failure summarization, and codebase Q&A. The important part is repository context: conventions, file ownership, recent commits, linked issues, CI logs, and deployment history. Without that, “AI code review” is mostly noise.

The architecture pattern that keeps working:

For most production systems, I use some variation of this:

source connector → raw artifact store → parser → normalizer → entity resolver → vectorizer → scorer → task queue → narrow agent → validator → human gate if needed → final action

Every step writes state.

Every external call has retry/backoff.

Every generated output has a schema.

Every risky action has an approval gate.

Every workflow has a dead-letter path.

That sounds boring, but boring is what makes automation survive Monday morning.

Stack:

Python Go TS Direct libraries. No heavy agent abstraction framework.

Typical stack:

Docker, litellm, playwright, httpx, aiohttp, PyGithub, pandas, instagrapi, crontab, instagra, pi polars, openpyxl, feedparser, google-api-python-client, crawl4ai, agent-browser, browser-use, playwright-cli, lxml, pydantic, sqlalchemy, sqlite, lancedb, kuzu, postgres, redis, celery/rq, ffmpeg, elevenlabs and others...

For small clients, a single VPS is often enough.

For bigger workflows, I split it into workers:

ingestion workers

browser workers

embedding workers

LLM workers

reporting workers

notification workers

The mistake people make is starting with “which agent framework?”

The better question is: where does state live, how do tasks recover, and how do we know the output is correct, do we have verifier, what metrics we set, etc...

The numbers:

Personal systems save me around 3.5 hours/day across research, admin, health planning, and learning.

Business systems usually replace or compress $4K–$6K/month of repetitive labor per client when scoped correctly.

Small systems often run on a $40–$100/month VPS plus model/API costs.

The expensive part is not hosting. The expensive part is bad architecture: duplicate work, broken retries, messy state, and humans cleaning up after “autonomous” agents

Curious how others here are handling state, retries, memory, and human approval gates in production agent systems. Happy to compare architectures in the comments.

reddit.com
u/Kakachia777 — 1 month ago
▲ 2 r/GeminiAI+1 crossposts

Since the start of the Champions League knockout stage, an AI system with multimodal memory has correctly predicted 12 of the 15 teams advancing from the Round of 16 onward, with 3 outcomes still to be decided. This is interesting for the AI community because it highlights how memory-based, multimodal systems may improve long-range prediction tasks by combining historical patterns, team context, match data, and evolving tournament dynamics rather than treating each prediction as isolated...

u/Kakachia777 — 1 month ago

Her supply chain consulting firm used to spend $480K a year on 4 senior research analysts producing 1 monthly brief that landed 4 weeks stale. Now 1 analyst handles client work and the system delivers a fresher brief every Monday at 6am. She let 3 analysts go in 6 months, redeployed $300K of payroll into a senior sales lead who's already booked $1.2M in new pipeline this quarter, and the platform paid back inside the first 4 weeks. She doesn't manage it. She uses the output.

This margin problem had been growing for 2 years. Every month her team produced a 40 page brief covering competitive moves across 8 suppliers: news, social, regulatory filings, industry forums, supplier sites, hiring patterns, conference rosters, vendor pricing. By the time she read it, half the deals she cared about had already moved. She'd hired 2 of those analysts in the last 18 months specifically to keep up. The work kept growing.

3 senior analysts off her books in 6 months. $177K saved in the first half year, run rate of $354K a year. By the end of month 1 the platform had paid for the entire first year. Brief frequency went from monthly to weekly. Stale data went from 4 weeks to 1. The 4th analyst stayed for client relationships. A brief written at 3am by a system that never has a bad day is more thorough than what 4 tired analysts produce by 4pm Friday.

She got 10 hours a week of her own time back. About 500 hours a year. The brief no longer arrives late when an analyst is sick. There's no 3 month ramp up when one quits. There's no inconsistent format because a different analyst wrote it that month. There's no quarterly review meetings, no underperformer conversations, no Slack pings at 7pm asking how to phrase a competitive insight. She listed 4 more pain points in her month 3 review I hadn't put on the original scoping doc.

Before she hired me, she'd burned 2 months trying to build it on the popular tools her CTO recommended. Here's what she tried and why none of it could ever work.

  • OpenClaw. Tried for weekly research across 3 of her client accounts. Memory bled between accounts on day 3. 1 user, 1 memory store, 1 control plane from the foundation up. Multi tenant is not a config flag.
  • Letta. Tried to hold per client competitive memory across runs. Memory model is keyed to a single user_id. Namespacing workarounds leaked context within 2 weeks. The memory hierarchy is 1 user from the design level.
  • Trigger.dev. Tried to schedule the Sunday night research jobs. Cron ran fine. No agent state across runs. Job runners are stateless by design. Adding cross run memory means rewriting the engine.
  • gpt-researcher. Tried for the autonomous research loop. Each run is independent. No accumulated knowledge across weeks. Adding state means rewriting the loop.
  • LangGraph. Tried to compose the multi source agent state machine. 4 months of engineering before reaching feature parity with what she actually needed. She runs a consulting firm, not a dev team.
  • Zapier, n8n, Make. Tried to glue everything together. Reactive triggers, not autonomous agents. Cannot watch for silence. Cannot decide what to do next. Wrong category entirely.

What runs for her instead is a system that lives entirely inside her own private execution lane. When her Sunday night run kicks off at 11pm, nobody else's work touches her data, her memory, or her tools, and her client specific brief never bleeds into another consultant's outputs. Inside her lane, 9 to 30 agents fire up, one per source: Twitter, Linkedin, Reddit, YouTube, supplier sites, regulatory filings, news feeds, industry forums, job postings, and a 9th agent that synthesizes the brief. Hiring patterns leak strategy faster than anything else, which is why job postings get their own agent. The system runs through the night and the brief is sitting in her inbox everyday, formatted the way her own clients want to consume it.

The piece that makes this work for her is the memory layer. Two memory stores run together. One for finding similar things across her past briefs, one for tracking how things connect across time. When the system reads a thread mentioning a supplier she covers, it links the thread to the supplier, to every previous mention from prior weeks, to the deals that moved 2 months ago when this same pattern showed up. The brief connects dots across 6 months of history.

Cost stays under $500 a month because routing happens per task. 85% of agent work is reading, formatting, summarizing. Fast cheap models handle that. The 15% that needs real thinking goes to the expensive models. A budget cap kills any task that spikes past 1.5x predicted cost.

She told me last month she barely thinks about the platform anymore. It just runs. 3 other founders in her circle have moved their research operations off humans this year. The ones still hiring analysts are watching their margin collapse against the ones who didn't.

What's the recurring research operation in your business that's still running on human bandwidth because the tools couldn't survive your real data?

reddit.com
u/Kakachia777 — 1 month ago

Her supply chain consulting firm used to spend $480K a year on 4 senior research analysts producing 1 monthly brief that landed 4 weeks stale. Now 1 analyst handles client work and the system delivers a fresher brief every Monday at 6am. She let 3 analysts go in 6 months, redeployed $300K of payroll into a senior sales lead who's already booked $1.2M in new pipeline this quarter, and the platform paid back inside the first 4 weeks. She doesn't manage it. She uses the output.

This margin problem had been growing for 2 years. Every month her team produced a 40 page brief covering competitive moves across 8 suppliers: news, social, regulatory filings, industry forums, supplier sites, hiring patterns, conference rosters, vendor pricing. By the time she read it, half the deals she cared about had already moved. She'd hired 2 of those analysts in the last 18 months specifically to keep up. The work kept growing.

3 senior analysts off her books in 6 months. $177K saved in the first half year, run rate of $354K a year. By the end of month 1 the platform had paid for the entire first year. Brief frequency went from monthly to weekly. Stale data went from 4 weeks to 1. The 4th analyst stayed for client relationships. A brief written at 3am by a system that never has a bad day is more thorough than what 4 tired analysts produce by 4pm Friday.

She got 10 hours a week of her own time back. About 500 hours a year. The brief no longer arrives late when an analyst is sick. There's no 3 month ramp up when one quits. There's no inconsistent format because a different analyst wrote it that month. There's no quarterly review meetings, no underperformer conversations, no Slack pings at 7pm asking how to phrase a competitive insight. She listed 4 more pain points in her month 3 review I hadn't put on the original scoping doc.

Before she hired me, she'd burned 2 months trying to build it on the popular tools her CTO recommended. Here's what she tried and why none of it could ever work.

  • OpenClaw. Tried for weekly research across 3 of her client accounts. Memory bled between accounts on day 3. 1 user, 1 memory store, 1 control plane from the foundation up. Multi tenant is not a config flag.
  • Letta. Tried to hold per client competitive memory across runs. Memory model is keyed to a single user_id. Namespacing workarounds leaked context within 2 weeks. The memory hierarchy is 1 user from the design level.
  • Trigger.dev. Tried to schedule the Sunday night research jobs. Cron ran fine. No agent state across runs. Job runners are stateless by design. Adding cross run memory means rewriting the engine.
  • gpt-researcher. Tried for the autonomous research loop. Each run is independent. No accumulated knowledge across weeks. Adding state means rewriting the loop.
  • LangGraph. Tried to compose the multi source agent state machine. 4 months of engineering before reaching feature parity with what she actually needed. She runs a consulting firm, not a dev team.
  • Zapier, n8n, Make. Tried to glue everything together. Reactive triggers, not autonomous agents. Cannot watch for silence. Cannot decide what to do next. Wrong category entirely.

What runs for her instead is a system that lives entirely inside her own private execution lane. When her Sunday night run kicks off at 11pm, nobody else's work touches her data, her memory, or her tools, and her client specific brief never bleeds into another consultant's outputs. Inside her lane, 9 to 30 agents fire up, one per source: Twitter, Linkedin, Reddit, YouTube, supplier sites, regulatory filings, news feeds, industry forums, job postings, and a 9th agent that synthesizes the brief. Hiring patterns leak strategy faster than anything else, which is why job postings get their own agent. The system runs through the night and the brief is sitting in her inbox everyday, formatted the way her own clients want to consume it.

The piece that makes this work for her is the memory layer. Two memory stores run together. One for finding similar things across her past briefs, one for tracking how things connect across time. When the system reads a thread mentioning a supplier she covers, it links the thread to the supplier, to every previous mention from prior weeks, to the deals that moved 2 months ago when this same pattern showed up. The brief connects dots across 6 months of history.

Cost stays under $500 a month because routing happens per task. 85% of agent work is reading, formatting, summarizing. Fast cheap models handle that. The 15% that needs real thinking goes to the expensive models. A budget cap kills any task that spikes past 1.5x predicted cost.

She told me last month she barely thinks about the platform anymore. It just runs. 3 other founders in her circle have moved their research operations off humans this year. The ones still hiring analysts are watching their margin collapse against the ones who didn't.

What's the recurring research operation in your business that's still running on human bandwidth because the tools couldn't survive your real data?

reddit.com
u/Kakachia777 — 1 month ago