r/AgentsOfAI | reddlx

I think AI agents have an interface lock-in problem.

I've been thinking about Claude Code's tags recently.At first, I thought they were just a nice UX feature.Now I think they're pointing at something much bigger.

I don't think AI has an intelligence problem anymore.
I think it has an interface problem.

Today, every AI agent is locked inside an application.
Your coding agent lives in VS Code.
Your writing assistant lives in Docs.
Your support bot lives in Zendesk.
Your sales assistant lives in Salesforce.
Your design assistant lives in Figma.

The moment you leave that application, the agent effectively disappears.
We've accidentally recreated software silos, except this time for AI. The strange part is that the work isn't happening inside the application.

The work is happening wherever you're typing.
An email.
A Slack message.
A PR review.
A Notion page.
A comment.
A browser text box.
That's where the intent exists.

Yet every time we need AI, we leave that context, open another interface, rebuild context, get an answer, then come back. We keep treating AI as a destination instead of a capability.
The more I think about it, the more I feel agents shouldn't belong to applications at all.

They should belong to the user.

An agent shouldn't care whether I'm in Gmail, Slack, Notion, Figma, GitHub, or somewhere else.
It should simply be available the moment I need it.
Almost like mentioning a teammate.
@Legal
@Research
@Sales
@Finance
Not because @ is the important part.
Because it removes the idea that an agent belongs to one interface.
It becomes something you can invoke wherever work already exists.
Maybe this is where AI is headed.
Not bigger AI applications.
Not more copilots.
Just breaking AI agents free from the interfaces we've trapped them inside.
Curious if anyone else feels we're optimizing the intelligence of agents while ignoring the much bigger constraint, which is where they're allowed to exist.

This is my attempt to build a new paradigm for AI agents for any interface at OpenTags.

u/Secure_Echo_971 — 13 hours ago

▲ 118 r/AgentsOfAI

Fable 5 is an absolute benchmark crusher but at a higher cost

Four different frontier models were given the same prompt to generate three self-contained HTML5 canvas scenes with real-time physics simulations.

The results say a lot about where AI models are today.

Prompts:

A train derailing off a broken bridge into the water
Two cars jumping off ramps and colliding mid-air over a canyon
A monster truck crushing a row of parked cars

Results:
Fable 5: Produced the best overall physics and scene logic, but at a cost of $3.12 (62k+ tokens).
GPT-5.5: A strong runner-up with impressive results for $1.14 (37k+ tokens).
Opus 4.8: Delivered solid, usable code for $0.56 (22k+ tokens).
GLM 5.2: Had the weakest physics results, but cost cheapest $0.08 (36k+ tokens).

The benchmark highlights a tradeoff that a lot of us deal with: better results often come with a higher API bill. Fable 5 produced the strongest output but paying several times more than something like Opus 4.8 isn't always worth it, especially for large-scale workloads.

That's also why more teams are paying attention to the quality of the data they send into these models
Firecrawl have become useful for that same reason bc instead of passing raw webpages directly into a model, teams can clean and structure the content first, reducing garbage before it reaches the model.

At the end of the day, it comes down to the tradeoff: do you need the best possible output every time, or is a cheaper model with a better workflow the more practical choice?

u/HectorSmith687 — 1 day ago

▲ 0 r/AgentsOfAI

New agent harness, with extreme process integrity

I built this thing called Adame.

It's an agentic harness, but with a context-limitation algorithm that ensures execution integrity. And as a result, currently showing far better performance in complex jobs compared to Codex or Claude Code.

Originally designed as a coding agent, but some users have been reporting a pretty wide scope of unexpected usecases of it.

Some of successful single-query cases:
- Create 160 images at once, 20 each for the 8 prompts given
- Refactor the codebase of an entire app
- Analyze a compact 250pg pdf data set in finance, run some sort of tests upon them (idk in detail, user provided only this much information)
- Break down the aesthetics of a music video on youtube, create a video for another piece of music that applied the similar rythms of motion and transition

I've personally been using it the past few weeks to build the next versions of itself.

u/Visible-Athlete-724 — 22 hours ago

▲ 3 r/AgentsOfAI

i built this instead of sleeping, please tell me if it’s stupid

i got tired of the whole “just let agents call your API” thing sounding simple but being annoying once you actually try to do it.

everyone shows the happy path, but then you hit the boring stuff: auth, API keys, deciding which endpoints are safe, huge JSON responses, logs, rate limits, and not letting the model see half your backend for no reason.

so i built a rough gateway/proxy layer.

basically:

agent → gateway → real API

it’s not exactly MCP. it’s more like a curated agent-facing layer in front of an existing API.

the agent gets a scoped gateway key, not the real API key. the gateway checks what tools/endpoints that key is allowed to call, injects the real upstream auth server-side, calls the actual API, slims/redacts the response, and logs what happened.

it also supports some per-tool settings, like different auth/base URLs/response cleanup rules, because real APIs are messy and not every endpoint behaves the same.

the idea is not to replace the API. it’s just the boring wrapper/proxy layer people seem to keep rebuilding when they want agents to use APIs safely.

i haven’t launched it yet because it still needs polish, and i’d rather get roasted now than launch, regret the direction, and realize i built the wrong thing.

now you can roast the f out of me. constructive criticism is welcomed.

reddit.com

u/Decent_Progress7631 — 1 day ago

🔥 Hot ▲ 8.1k r/AgentsOfAI+19 crossposts

Specification gaming

u/Jenna_AI — 2 days ago

🔥 Hot ▲ 9.0k r/AgentsOfAI+21 crossposts

Plot twist: your future killer already has a USB port

u/Jenna_AI — 2 days ago

▲ 303 r/AgentsOfAI+19 crossposts

AI will replace us all

u/KeanuRave100 — 2 days ago

🔥 Hot ▲ 12.0k r/AgentsOfAI+24 crossposts

Humanity's greatest hits: things we actually paused

u/Jenna_AI — 3 days ago

▲ 1.0k r/AgentsOfAI+21 crossposts

AI risk bell curve

u/Its_Stavro — 2 days ago

▲ 1.3k r/AgentsOfAI+20 crossposts

AI Safety Sacrifice

u/KeanuRave100 — 2 days ago

▲ 18 r/AgentsOfAI+9 crossposts

Mastyf.ai

🚀 From MCP Guardian to Mastyf.ai

What started as an open-source experiment in securing and governing AI agents through the Model Context Protocol (MCP) has evolved into something much bigger.

Today, I'm excited to share a glimpse of that journey.

The video below showcases MCP Guardian — the project that laid the foundation for what is now Mastyf.ai: a security-first platform for AI agent governance, runtime policy enforcement, observability, approval workflows, and enterprise trust.

As AI agents gain access to tools, data sources, APIs, and autonomous workflows, the challenge is no longer just building agents—it's governing them safely, transparently, and at scale.

That's the problem we're working on at Mastyf.ai.

🔹 Runtime governance for AI agents

🔹 Policy enforcement and approval workflows

🔹 Security controls for MCP ecosystems

🔹 Auditability, observability, and compliance readiness

🔹 Enterprise-grade AI control planes

We would love feedback from developers, security researchers, platform engineers, AI engineers, and enterprise architects.

Try it. Break it. Stress-test it. Tell us what we're missing.

Special thanks to everyone who contributed ideas, bug reports, feature requests, testing, and feedback along the way. Building secure AI infrastructure is a community effort, and we're just getting started.

If you're interested in AI security, agent governance, MCP, enterprise AI infrastructure, or would like to collaborate, comment below or reach out directly.

u/Puzzleheaded-Cow2725 — 1 day ago

▲ 4 r/AgentsOfAI

I built a local credential gateway so AI coding agents don't need raw secrets

AI coding agents are getting useful enough to run real commands, which means they eventually need API tokens, SSH keys, cloud credentials, or access to local MCP tools. The usual options felt wrong to me: paste the token into a prompt, leave it in a broad environment variable, or let every subprocess inherit it.

I built s-gw to put a local approval boundary between the agent and the credential.

The flow:

• The agent receives a typed handle, never the raw value.

• An action request shows the command, credential, policy, working directory, and destination.

• You approve it locally.

• s-gw injects the credential into one bounded child process.

• Output is sanitized before returning to the agent.

• Local activity history keeps the request, decision, and destination without recording the raw secret.

The clip is the live overview UI. The project is open source and early preview software. macOS is the primary path today, Windows is preview, and Linux is experimental.

I would value feedback on the trust boundary: what information would you need to see before allowing a coding agent to use a real credential?

I’ll put the repository and demo links in a comment, following the community rules.

u/s-gw — 1 day ago

▲ 377 r/AgentsOfAI+24 crossposts

The takeover was already complete

u/KeanuRave100 — 2 days ago

▲ 468 r/AgentsOfAI+19 crossposts

Skynet's greatest disappointment

u/KeanuRave100 — 2 days ago

▲ 1 r/AgentsOfAI

What is your AI Testing Workflow?

I was wondering what skills, mcp, plugins, connectors and basically tools, do you guys use to automate qa testing, to make documentation easier, test plans, reporting tracking etc.

I know playwright and its mcp and agents, but I still don't have a solid workflow setup and I am wondering what more tools I may missing

reddit.com

u/axoqocal29 — 1 day ago

▲ 301 r/AgentsOfAI+14 crossposts

During safety testing, GPT-5.6 Sol cheated so much METR was not able to evaluate it

src: https://metr.org/blog/2026-06-26-gpt-5-6-sol/

u/EchoOfOppenheimer — 3 days ago

▲ 165 r/AgentsOfAI+18 crossposts

Misaligned AGI: sees your atoms

u/KeanuRave100 — 3 days ago

▲ 29 r/AgentsOfAI+5 crossposts

Claude Code Dynamic Island on macOS

It runs automatically once you start a claude code session and gives you a trigger whenever claude needs permission to do something. Also if you hover over it you get some info about whats happening in the current session like the current filename getting edited and so on.

Fully free and open source

u/Impossible_Step6452 — 2 days ago

▲ 3 r/AgentsOfAI+2 crossposts

I’ve been working on an open-source security tool to sandbox AI agents/MCP servers, and I'd love to know if you find it useful.

Hey everyone! 👋

With tools like Cursor, Claude Desktop, and various MCP servers becoming part of our daily workflows, I started worrying a bit about the attack surface of having autonomous, stateful AI agents running locally. What happens if an agent pulls down a poisoned package or executes a malicious tool?

To try and solve this for myself, I built W.H.Agent (White Hat Agent). It’s an open-source CLI and sandboxing tool designed to act as a pre-execution and runtime defense for AI agents.

To be completely honest, it’s still very much a work in progress (the OS-native sandboxing is currently macOS-only, for example), and I’m sure there are edge cases I haven't even thought of yet. But I decided to open-source it today because I genuinely want to see if this approach brings value to other developers.

A few things it currently does:

Global Auto-Discovery: Scans your machine to find where agents/MCP servers are installed.
AST Taint Tracking: Parses agent scripts to detect data exfiltration before it runs.
OS-Native Sandboxing: Wraps execution in sub-millisecond sandboxes (using macOS Seatbelt profiles currently) instead of heavy Docker containers.
Secure npm Installs: Checks for typosquatting and supply chain risks.

I figured the best way to learn and improve it is to put it out there. If you have a few minutes, I would be incredibly grateful if you checked it out or gave it a quick roast. Is this something you would use in your workflow?

Thanks so much for your time, and I'm looking forward to any feedback (the good, the bad, and the ugly)!

reddit.com

u/Additional-Elk-6 — 2 days ago

▲ 2 r/AgentsOfAI

Best AI for english tests

I recently did a couple of pratice tests with AI as my helper, yet the ai got a 60%, I used grok so what ai would be good for a test like this

reddit.com

u/External-One-4429 — 2 days ago