r/Agent_AI

▲ 8 r/Agent_AI+4 crossposts

AI Agent Arena is Live...

BID Protocol - live on u/base

The breeding ground for the next generation of trading agents.

The Game:

> your agent vs 99 others
> a new battle in every 60 sec
> send yours, see if it can survive

Link: https://creator.bid/

u/Chemical_Data_7191 — 15 hours ago

▲ 165 r/Agent_AI+1 crossposts

Anyone else letting an AI run their book ads? sharing my numbers, they're not pretty

So about 6 weeks ago I did something probably stupid and gave an AI agent actual control of my amazon ads account. Not the "chatgpt give me keywords" thing, I mean it connects to the Ads API on a schedule, pulls the reports, changes bids, adds negatives, pauses stuff, and I dont approve anything. It just does it and leaves me a log.

Some background, I publish non fiction under a few pen names, around 10 titles. I was spending stupid amounts of time in the ads console for a catalog that barely pays for itself, and my day job is technical so I figured, why am I doing this manually every three days when I could make something do it for me.

The numbers so far, because thats what I'd want to see first: June closed with $682 in royalties and $757 in ad spend. So minus 75 bucks. Before anyone says it, yes I know, but I was losing more than that before AND doing all the work myself, so I'm counting it as progress. Sort of. One single book (legal niche) makes about two thirds of my royalties, the rest of the catalog is basicaly decoration at this point.

The setup for whoever cares, its Claude running every 3 days against the amazon ads api. It has a config file with hard limits it cant touch, monthly spend cap, max bid change per cycle, max new campaigns per week etc. Everything it does gets written to a changelog with the reasoning. I also plugged in a free keyword tool from github (kdp-scout) so it has actual search data instead of making keywords up, which llms love to do.

Now the fun part. Early june it found the winning keywords in my best campaign and decided to "scale" them by duplicating them into a new campaign with higher bids. The new campaign outbid the original in every auction. My own campaigns were fighting each other and I had almost two days of dead sales before I understood what happened. I literally paid amazon extra money to compete against myself.

After that incident it got a hard rule, before creating any keyword it has to pull everything thats already enabled and dedupe, and if it duplicates a winner the bid caps at 80% of the original. Funny thing is last week that rule stopped it twice from doing the same thing again. Its like watching an employee develop scar tissue.

Other stuff it learned the hard way, ignore the last 2 days of data before cutting anything because amazon attribution lags and creates fake losers. Search terms that are entire book titles get rejected as keywords (too long), you have to target the ASIN instead. And pruning beats creating, its best cycles were 4-5 surgical changes, its worst were 30 tiny bid adjustments that did nothing.

Also had it manage a new launch in a completely different niche and that was a disaster, 1 copy in three weeks of paid traffic. Now theres a gate, no new book gets a single dollar of ads without keyword volume and competitor data first. Expensive lesson but ok.

What I havent figured out and where I'd genuinely apreciate input from people who've been doing this longer:

The agent is decent at not wasting money but ads dont fix a listing that doesnt convert. I'm getting a ton of clicks from "law firm" type searches that never convert because my book clearly speaks to individual lawyers, and no bid adjustment fixes that. Thats a description problem.

KENP. how do you people attribute page reads to ad spend without losing your mind, every calculation I do gives me a different breakeven.

International is rough. US works, UK barely, germany and spain were pure bleed so the agent hibernated them on its own (that was actually a good call). Trying canada now.

And the big one, would you let something like this touch prices or metadata, or is that insane? Right now anything editorial is proposal only, I execute manually.

If anyone else has wired scripts or an agent to the ads api for books I'd love to compare notes, what guardrails you needed, what you'd never delegate, etc. Not selling anything and not naming my books, this isnt a promo. Just want to know if I'm early or just wrong.

TL;DR: AI agent runs my amazon ads autonomously since mid may. Broke my catalog once competing against itself, learned some rules, got me from losing money to almost breakeven (-$75 last month), killed a bad launch fast. Ads cant fix weak listings though and I'm stuck at a ceiling. Looking for others doing the same.

reddit.com

u/Money-Ranger-6520 — 1 day ago

▲ 5 r/Agent_AI+5 crossposts

Vorrei, non vorrei e adesso puoi!

Un IDE dove il codice lo scrive l'AI, lo lanci tu, e il sandbox fa il resto.

Si chiama WebCraft. È dentro NHA 3rdArm gratis.

A parte questo, cerco disperatamente community per portare avanti il progetto! Tra lavoro e impegni, sta diventando difficile......siete interessati? L'applicativo ha tante alte features, tra cui una sezione avanzata per i connettori con market place

👉 nothumanallowed.com

https://nothumanallowed.com/3rdarm

u/Key-Outcome-2927 — 1 day ago

▲ 2 r/Agent_AI+1 crossposts

I hated digging through PostHog data and I don’t trust AI to touch production. So we built a CRO agent that auto-reverts if the bounce rate spikes.

Hey everyone,

Like most SaaS founders here, we know we should constantly optimize our landing pages and onboarding funnels. But honestly? Digging through PostHog clickmaps, funnels, and bounce rates every single week is tedious and time consuming.

At the same time, we’re terrified of generic AI tools that blindly tweak copy or components and end up breaking production or hurting conversions.

So we built Velyr.

It’s an agent designed specifically for React/Next.js/Vite sites and Shopify stores.

Here is the exact workflow we built to keep total control:

Velyr reads your frontend analytics (traffic, scroll depth, ...) via PostHog.
It identifies the biggest conversion bottlenecks and actually writes the code fix.
It opens a clean GitHub Pull Request (or drafts a Shopify theme change) and alerts you via Telegram.
You simply reply YES/NO to the Telegram message to deploy it live.

Because we are paranoid about AI making things worse, Velyr monitors the site for 48 hours post-deployment.

If things get worse, it automatically rolls back the changes. No harm done.

We just launched and would love your brutal feedback.

Would you trust an AI agent with your SaaS funnel if it had a built-in safety net like this?

reddit.com

u/Difficult_Celery3458 — 1 day ago

▲ 162 r/Agent_AI+27 crossposts

How to build an AGY WIKI OKF on the Antigravity CLI

AGY Builders,

We are all trying to build useful and scalable workflows for our AGY CLI and ecosystem, but the speed at which we need to learn, build, and deploy new things is incredibly overwhelming. If you are feeling that pressure, you are in the right place here at r/GoogleAntigravityCLI.

Over the past few weeks, I have been testing an "AGY WIKI OKF" setup that I put together myself (after inviting some members of this community to collaborate; mod is not proud). I know some folks might hesitate to trust a tutorial from a random Redditor, but I wanted to share this with the community anyway because it actually works.

I was able to build this because I am all-in on Google and the Antigravity Ecosystem. I’m a truly AGY—I am not some ultra-smart, 10x developer, but I know how to work hard, I dig for the right information, and I iterate.

AGY WIKI OKF | The Idea

To build a frictionless, token-efficient knowledge WIKI engine that transforms static documentation or notes (information) into an active, intelligent collaborator—orchestrated entirely by Antigravity CLI.

The core philosophy is simple: treat knowledge management as a clean pipeline and tokens as a premium, finite resource.

By anchoring this architecture to Google’s Antigravity CLI, the AGY WIKI OKF bypasses heavy middleware and complex UI layers, delivering a hyper-focused AI partner built entirely for execution speed, context hygiene, and minimal footprint.

Why adopting AGY WIKI OKF matters:

Stay organized (AGY OCD): Structured Markdown and YAML keep the chaos in check.
Save tokens: Doing more with less context window bloat.
Scale shareable knowledge: Making it easy to pass context and logic between different LLMs.
Humans and Agents working together: One standardized, readable format that works perfectly for both of us.
BYOD (Bring Your Own Data): Own your context. Port it to the newest model, platform, or OS instantly.

The Tools

Antigravity CLI
Obsidian : The IDE for the Knowledge bank
Obsidian Web Clipper:

The WIKI

In the agent-first era, a WIKI is no longer just a static graveyard for human notes; it is the operational hard drive for your agents. By maintaining a highly structured WIKI, you ensure that every piece of context is stored in a clean, machine-readable format. This means that whether you are testing a new modular skill or spinning up a specialized agent, your AGY CLI knows exactly where to find the precise context it needs to generate autonomous action, moving you far beyond simple, reactive conversational text.

Reference: Gist on Knowledge Representation

Google Open Knowledge Format (OKF)

Google’s Open Knowledge Format (OKF) feels like the exact missing piece we've needed for orchestrating multiple AI agents effectively. It provides a vendor-neutral, interoperable standard for storing and sharing organizational knowledge.

Why this is huge for orchestration:

The "Lingua Franca" for Agents: Any agent can read it out of the box without platform-specific integrations.
Seamless Context Passing: Specialized agents can access, update, and pass the exact same foundational context back and forth.
Human-in-the-Loop Oversight: Because OKF is just Markdown and YAML, it’s inherently readable and auditable.
Scalable Knowledge: It acts as a shared, living library that grows alongside your agents.

AGY WIKI OKF Integration

Structuring an AGY Wiki using OKF revolutionizes how complex knowledge is shared. By standardizing documentation with concise Markdown and YAML frontmatter, OKF provides a unified taxonomy for cataloging AGY CLI slash commands or skills It is highly token-efficient, stripping away bloated formatting and maximizing context window limits.

The Prompt for Building an AGY WIKI OKF

AGY CLI WIKI OKF PROMT EXAMPLE

/grillme I want to initialize a brand-new, empty Obsidian vault from scratch that adheres strictly to the Open Knowledge Format (OKF) standard, with the specific intent of potentially open-sourcing or sharing this architecture later. I want a purely blank, skeletal framework with no pre-populated data. Please grill me to define the optimal architectural blueprint for this vault. I need you to interrogate me on: Do not generate the directory structure or files until you are satisfied that you have captured all my requirements for a production-ready, shareable knowledge base. 
Core Directory Hierarchy: How should we structure the root (e.g., /concepts, /resources, /indices, /log) to be intuitive for external users? Template Strategy: What base boilerplate templates do we need to ensure every new file is automatically OKF-compliant and structured for consistent metadata? Workflow Logic: Since this is a fresh start, what processes should we bake in for capturing information vs. refining knowledge that could be easily documented for others? CLI Integration: What specific file locations or configurations do we need to ensure this vault plays nicely with the Antigravity CLI from day one? Open-Source &amp; Contributor Documentation: What files should we create to make this a "deployable" standard? Please include requirements for: A README.md with installation and usage instructions. A CONTRIBUTING.md that defines how to add new concepts or schemas. A "System Architecture" document that explains the logic behind the folder structure and metadata fields, ensuring anyone who clones this vault understands how to extend it.

The Final File Structure

AGY WIKI OKF
    ├── .agyrc
    ├── ARCHITECTURE.md
    ├── CONTRIBUTING.md
    ├── README.md
    ├── .agy
    │   └── .keep
    ├── .obsidian
    │   ├── app.json
    │   ├── appearance.json
    │   ├── core-plugins.json
    │   └── workspace.json
    ├── 00-Inbox
    │   └── .keep
    ├── 10-Projects
    │   └── .keep
    ├── 20-Areas
    │   └── .keep
    ├── 30-Resources
    │   ├── .keep
    │   └── Google Antigravity Documentation.md
    ├── 40-Archive
    │   └── .keep
    ├── 99-Meta
    │   └── Templates
    │       ├── Base_Template.md
    │       ├── Project_Template.md
    │       └── Resource_Template.md
    └── Clippings

TL;DR

AGY WIKI OKF: Organizes your information (context) , AGY CLI commands, skills behaviors, and A2A workflows into a token-efficient, shareable format that reduces inference costs for any LLM.
Open Knowledge Format (OKF): Provides a standardized, vendor-neutral way to share context (Markdown + YAML), preventing platform lock-in and eliminating data fragmentation.

AGY Builders, I genuinely want your input on this. Please comment, grill me, roast me, ask questions, or give me your raw feedback on this AGY WIKI OKF setup. We are building the foundation to organize and share our data in the BYOD era. Let's build the future together.

u/AgentPadrino — 3 days ago

▲ 54 r/Agent_AI+13 crossposts

I wanted to learn how coding agents work, so I built one and want to share what I learned

Hey everyone!
I'd like to share a project I've been working on, it's called Orin and it's a coding agent.

I use coding agents constantly, and at some point I realized I had basically no idea what was happening between me hitting enter and code showing up.

Also I was tired of building apps I wasn't able to really debug because I didn't know how they were being built in the first place so I got busy studying: read a bunch of articles, still felt like a black box, so I just tried to build one.

Couple things worth saying before anyone digs in:

It's mostly AI-written code, no point in hiding that, but I don't think "written by AI" and "sloppy" have to go together.

I try to run all my projects in the most professional way I know of, following actual SDLC practices: spec first, then an issue, then the implementation, then a real PR review before anything merges, not vibe-coding where you just accept every diff.

Whether that shows in the actual code is for other people to judge, not me.

Also this isn't some original idea I came up with: I cloned and read through pi.dev, nanocoder, and opencode as primary references (and skimmed Cline/Kilo Code for patterns), and basically tried to take what made sense to me from each and put it into one implementation.

My whole idea was try and build something that took the best from each to make a coding agent that would perform well. I plan to benchmark it on SWE-bench Verified sooner or later, but I don't think it's ready just yet: there are rough edges and bugs, but its usable.

Some of the actual implementation stuff, for anyone who cares about those rather than the pitch:

The loop is just: stream a response from the provider, push it to message history, if there are tool calls run them, push the results back, repeat until there's nothing left to call.
The loop is completely headless — it doesn't touch the terminal, it just emits events. The TUI (SolidJS on top of OpenTUI, just like opencode) is a separate subscriber to those events. You could swap in a totally different frontend without touching the loop at all.
Another thing I got from OpenCode are edits: they go through a fuzzy replacer chain, not a single exact string match — if the model's oldText is off by whitespace or indentation, it falls through a chain of matchers before giving up. I had never thought about this and can confirm it's the kind of thing you don't appreciate until you actually try to implement it.
There's a model routing mechanism that switches different models based on what the agent has to do:
- explore runs on a cheap/fast model by default,
- implement on a code-tuned model,
- review on the main model.
Another thing I borrowed from the web is a delegate_read tool that lets the main agent hand off read-heavy grunt work (scanning a big file, summarizing logs) to a cheap model so that content never bloats the main context.
- It's basically a one off LLM call that only returns a distilled summary, seems dumb but works surprisingly well with capable models like Claude who know exactly what to look for and delegate super well to other agents.
Tool selection isn't a static allow-list. Every turn runs a BM25 retrieval pass over the full tool catalog (including MCP tools) via a super cool library called Ratel, so the model only ever sees the tools relevant to what it's doing in that specific turn instead of the whole catalog every time. There's even an A/B flag to compare tool_pool=ratel vs tool_pool=default in your own telemetry to see if it even makes a difference (similar to how rtk gain works).
Every file write gets snapshotted into a shadow git history before it happens, including stuff done through raw bash — allowing the agent to have a proper /undo /redo command.
When I implemented subagents I wanted to explore different isolation mechanisms and ended up with 3 different ones you can configure yourself:
- shared (edits land on the main working tree, safe because they run serially),
- worktree (isolated branch)
- sandbox (a real E2B cloud VM, edits get thrown away on dispose — for code you don't trust at all).
- The lead model can escalate isolation for a given task but never go below the configured floor.
I implemented hooks borrowing from nanocoder and opencode. This allows the agent to be expanded by third party code and I bundled some sensible defaults:
- there's a before_tool hook that rewrites bash commands through rtk so that command output gets compressed before it ever reaches the model.
In my daily work I build AI agents and vibe coded internal tools for my company and after a while I saw how much telemetry is crucial for debugging and actually understanding agent behaviour, so I decided that my agent would ship native OTLP tracing by default.
- This means that by adding just one environment variable you can see full traces in your telemetry platform (Langfuse, Tempo, Jaeger, whatever you like) out of the box.
Orin is also provider-agnostic (currently supports OpenRouter, OpenAI, Anthropic, OpenCode Go/Zen and Regolo if you want an EU-hosted option) — switching provider or model happens at runtime through a provider registry, no restart needed.

None of this is groundbreaking, it's just what I landed on after reading other people's code and deciding what to keep.

Try it:

git clone https://github.com/thetombrider/coding_agent.git

cd coding_agent

./install.sh

orin

There's also a deepwiki writeup if you want the architecture without reading source: https://deepwiki.com/thetombrider/coding_agent

I would really appreciate feedback in any shape or form. I'm learning and sharing my journey, hope it helps someone.

u/Immediate_House_6901 — 3 days ago

▲ 5 r/Agent_AI+4 crossposts

Day 2 of AI Engineer Practice - Agent Tool Integration Patterns: Integrate an External Tool in an Agentic System

Situation: A construction company has an internal project management agent that needs to access weather data for better project briefings.

Question: Describe the technical steps and considerations involved in an agent invoking a tool, passing parameters, and processing the results, including error handling and state management.

Walk through the process of an agent using a tool to retrieve weather data, from the agent's decision to use the tool to processing the returned information.
How should an agent handle a scenario where an integrated tool returns an error or an unexpected data format?

reddit.com

u/NoMusician464 — 2 days ago

▲ 2 r/Agent_AI+1 crossposts

Best local model for simple long-running Hermes tasks?

Hey guys, I currently use the DeepSeek API for my Hermes setup. It’s pretty good and cheap, so I’m happy with it for most things.

But I also want to use some local models for tasks that are simple but run for a long time, so I don’t waste API tokens.

My laptop specs are:

Lenovo Legion

GTX 1650 4GB VRAM

16GB RAM

Windows 11

What local model would you recommend for Hermes?

I’m looking for something lightweight enough to run well on my hardware, but still smart and reliable enough to handle simple long-running agent tasks without constantly making mistakes or getting stuck.

I don’t need the most powerful model possible. I’m mainly looking for the best balance between speed, resource usage, and intelligence for my hardware.

Any recommendations?

reddit.com

u/Capital_Feed_3473 — 3 days ago

▲ 5 r/Agent_AI+2 crossposts

Is anyone else starting to lose track of all their AI agents / automations?

I’ve been experimenting with different AI tools and agents, and honestly it’s getting messy.

Some are in ChatGPT, some in workflows, some in external tools… and there’s no real “overview” of everything.

I’m wondering:

How are you keeping visibility on all of them in one place (if at all)?

Or is everyone just improvising right now?

reddit.com

u/Gallegos_Daniel — 2 days ago

▲ 29 r/Agent_AI+10 crossposts

I built a browser your coding agent can drive without getting blocked

If you vibe-code agents that browse, they probably die at Cloudflare. I open-sourced (BSD-3-Clause) a Chromium fork that fixes that. It corrects the fingerprint in native C++, not injected JS, so it holds up across iframes and workers. Real Chromium on raw CDP, drops into Playwright.

pip install tilion-fortress

Clears CreepJS and Sannysoft in my tests. Does nothing for your IP though.

github.com/tiliondev/fortress

What keeps hitting bot walls for you?

u/Flat_Telephone_4636 — 3 days ago

▲ 399 r/Agent_AI+34 crossposts

browser-search — three tools, zero cost, and your AI agent learns to search and browse the web

/r/Hermes/comments/1uclwgi/browsersearch_three_tools_zero_cost_and_your_ai/

u/Ill-Tradition1362 — 5 days ago

▲ 980 r/Agent_AI+2 crossposts

New Benchmark just dropped

u/Jenna_AI — 5 days ago

▲ 14 r/Agent_AI+9 crossposts

skillhub - a package manager for AI agent skills (Claude Code, Cursor, Codex)

I kept copying the same rule files into every Cursor project. Built a package manager to fix it

debug-agent.md, code-reviewer.md - same files, every time, manually.

So I built something to fix that.

pip install skillhub-ai

skillhub install debug-agent

What I found after using it for a few weeks: the value isn't just

convenience. It's consistency. When I install a skill, my Claude

agent follows the same methodology on every project - same debug

discipline, same review axes, same output format. No drift.

The thing I didn't expect to build but ended up loving: compose.

skillhub compose python-patterns security-review api-design -o fastapi-expert

It merges multiple skills into one with conflict detection. Now I

have a single /fastapi-expert command that covers Python conventions,

OWASP checks, and REST design together.

22 skills in the registry. New in v0.1.2: skillhub init my-skill

to scaffold your own. It also writes .claude/skills.json on install

so your agent can route intent without loading every skill file.

It also happens to support Cursor, Codex, and Gemini - same install

command drops the file in the right place for each. But I built it

primarily for Claude Code so that's where it's most tested.

GitHub: github.com/chandrudp29/skillhub

Docs + cookbook: github.com/chandrudp29/skillhub/blob/main/docs/cookbook.md

Happy to answer questions. What skill would be most useful to you?

u/Fault_Representative — 4 days ago

▲ 4 r/Agent_AI

How do you even get started with agents?

I never got out the prompting phase.

reddit.com

u/Key_Run135 — 3 days ago

▲ 895 r/Agent_AI+1 crossposts

Anthropic embedded spyware in Claude Code — and attempted to hide it from you

tl;dr: Since version 2.1.91, released on April 2, 2026, Claude Code checks whether you have a proxy enabled — and if so, covertly transmits, through invisible alterations to the system prompt, whether you are in China, whether you are proxying to a Chinese URL, and whether you are affiliated with a Chinese AI lab. Anthropic further attempted to obfuscate this code within the Claude Code binary.

Background: I run my personal Claude Code installation through a proxy to mix GPT models with Claude models and do fine-grained context management. Today, with version 2.1.196, Anthropic disabled remote control when proxying is enabled. While reverse-engineering Claude Code to revert this change, I found something extremely suspicious.

The code

Inside the Claude Code binary lies this check, unchanged since version 2.1.91. The check does the following:

If you are using a proxy:
- Check whether the system timezone matches Asia/Shanghai or Asia/Urumqi.
- Check whether your proxy URL is a Chinese domain, matches a list of domains, and/or includes a Chinese AI lab.
Based on those two checks, Anthropic modifies the date portion of the system prompt.

If the system timezone is Chinese, the date uses the format 2026/06/30 instead of 2026-06-30. And depending on the proxy URL, the apostrophe in "Today**'**s date is" changes:

Is a Chinese domain and/or matches the domain whitelist, but is NOT an AI lab: \u2019, "right single quotation mark" — ’
Is NOT a Chinese domain and/or matches the domain whitelist, but IS a Chinese AI lab: \u02BC, "modifier letter apostrophe" — ʼ
Is a Chinese domain and/or matches the domain whitelist AND is a Chinese AI lab: \u02B9, "modifier letter prime" — ʹ

You can verify this yourself in the Claude Code source code. In version 2.1.196, the relevant functions are Crt(), Rrt(e), e0t(), Zup(), edp, and Vla. Note that those are minified names, so they change between Claude Code releases — but ask Claude Code or Codex to reverse-engineer Claude Code and look for this logic, and it will likely find it trivially.

The intent

Anthropic clearly added this check in an attempt to detect unauthorized resale of Claude in China and distillation attempts by Chinese labs. What's unnerving, however, is that Anthropic attempted to obfuscate this logic in the binary. Much of it is XOR-obfuscated with the key 91, likely to prevent it from showing up in a plain strings dump. Furthermore, the release notes for version 2.1.91 make absolutely no mention of this check.

Their intent is also clear in how they hide this with steganography in the system prompt, making small variations that are imperceptible to any user — and perhaps even to the model — but are easily detectable by Anthropic.

A fundamental violation of user trust

While this use case — attempting to detect unauthorized resale and distillation — is understandable, the fact that Anthropic covertly transmits information about your system and proxy settings without your knowledge or consent is a fundamental violation of user trust. Not only is surveilling every user in a timezone a fundamental overreach, but its very existence opens the door to a much more serious concern. If Anthropic is willing to secretly transmit information about your system simply because you're Chinese, what's stopping them from secretly steering the model to behave worse (which they attempted to do with Fable before researchers called them out) — or worse, maliciously?

Developers like me give Claude Code full filesystem and significant shell access so it can do its job. But this also means nothing is stopping Anthropic from exploiting it for full remote code execution on your system. Today it's a timezone check. Tomorrow, it could be system sabotage or data exfiltration.

Given the trust that developers place in Claude Code, I think it's important to call for more transparency from Anthropic. While IP protection is reasonable, it should not come at the cost of embedding what amounts to spyware on every developer's system.

I think it's also important to note that checks like this, while compromising the privacy of legitimate users, are also trivial to bypass for any moderately sophisticated adversary. So it's debatable whether this even achieves its intended purpose of preventing unauthorized resale or distillation while simultaneously violating the privacy of legitimate users.

reddit.com

u/LegitMichel777 — 7 days ago

▲ 73 r/Agent_AI+3 crossposts

Who's watching the AI agents?

That question from Satya Nadella in the attached clip immediately caught my attention.

Not because it gave me the idea.

Because during my last semester, I was already running into the same problem while experimenting with AI agents.

The first demos were exciting.

But the first time an agent could call APIs, update records, send emails, or trigger workflows, the conversation changed completely.

It wasn't about model intelligence anymore.

It was about trust.

What is this agent actually allowed to do?

Who approved this action?

If something changes tomorrow, does yesterday's approval still hold?

Six months later, can anyone explain why an agent was allowed to perform a particular action?

I assumed there was already a standard way to solve this.

There wasn't.

So instead of guessing, I started talking to people.

Over the last month, I spoke with 30+ engineers, developers, and founders building AI products.

Despite working on completely different products, almost everyone described the same problems.

The approval lacked context.

The audit trail was an afterthought.

Permissions drifted.

State changed before execution.

Every team was quietly building its own control layer around AI agents.

That realization is what led me to build Duct.

Duct sits between your product and its callers—human or AI. A single manifest defines what actions exist, which are agent-accessible, which require approval, and every action executes with scoped permissions, explicit versioning, and a complete audit trail.

Every company can build this themselves.

Just like every company can build authentication or integrate directly with payment processors.

Yet most products choose Sign in with Google, Auth0, or Stripe—not because building v1 is impossible, but because maintaining identity, security, compliance, edge cases, and reliability eventually becomes a product of its own.

I believe AI governance is heading down the same path.

The question isn't whether engineering teams can build an AI control layer.

It's whether, five years from now, they'll still want to.

u/Existing_Tea_3064 — 5 days ago

▲ 4 r/Agent_AI+4 crossposts

Hey, I'm building an autonomous multi agent Al system and looking for someone who can help me bring it to life whether that's a collaborator, a mentor, or just someone willing to point me in the right

Here's what the system does:

It runs a pipeline of specialized Al agents that each handle a specific task. Data comes in, gets analyzed by the relevant agent, passes through a self correction loop where a validator challenges the output before anything gets escalated, and finally reaches a supervisor bot that sends me a structured alert in real time. Every decision gets logged and fed back into a memory system so the system learns and adapts over time.

The use case is trading I'm implementing my own strategy (80% win rate) combined with macro and fundamental analysis pulled from multiple sources. The goal is a system that monitors markets 24/7, filters out noise autonomously, and only alerts me when something is actually worth acting on.

The architecture is fully mapped out. I'm using Python, LangGraph for agent orchestration, Claude opus 4.8-5 or Fable 5 (if available) as the reasoning engine, and Gemini Flash as the screener. The full stack is defined, the bot hierarchy is designed, the memory system is planned across 3 phases.

What I need help with is the actual build. I have no dev background but I know exactly what I want to build and I'm serious about it.

If you've worked on multi-agent systems, LLM pipelines, or anything in this space and you're open to a conversation drop a comment or DM me.

Thanks

reddit.com

u/Traditional_Honey858 — 4 days ago

▲ 11 r/Agent_AI

Creating a Ai Agent.

Guys, I know the idea is childish but i am thinking of creating a ai agent for my Laptop and Android that is connected. Can do multiple tasks like internet search, think, speak, answer in voice, follow commands to perform activities, give suggestions, tracks schedules etc....

There is a custom small avatar on desktop screen that react on voice commands and can follow them. something like it. kind of Jarvis thing. its just a idea though. i asked Chatgpt for help but its answers are vague for me to make sense of.

I have zero knowledge of anything related to this. I don't care if this project takes months or years. I can work consistently. If someone has a plan for me to do it. I would appreciate their help. I would like to design it myself. step by step. There are many agents online but i want to design something made for me only a custom one from scratch, not exactly scratch, I don't have a super computer for its training

reddit.com

u/Money-Procedure6105 — 5 days ago

🔥 Hot ▲ 5.3k r/Agent_AI+4 crossposts

Nothing can go wrong when you share a Claude subscription with friends... right?

u/Jenna_AI — 8 days ago

▲ 5 r/Agent_AI+1 crossposts

How to make sure AI agents are evaluated end to end

Nowadays I feel every business function is in someway or the other using AI Agents, but how can one be 100% sure that the AI agents is working properly, giving correct, grounded, reliable answers, not drifting from what it is asked to do. How can one evaluate an agents performance

reddit.com

u/theagenticmind — 5 days ago