u/Ok_Top_5458

After working with AI agents for a while, I kept running into the same issue:

eventually the agent ignores boundaries, reads .env files, touches production resources, or uses secrets it was never supposed to access.

Even with MCP read-only setups and carefully written prompts, the shell itself is still trusted too much.

So I started building a shell-level control layer for AI agents:

block or sanitize dangerous commands
expose virtual/fake secrets instead of real ones
separate DEV / PROD access policies
restrict network/domain access
enforce runtime policies instead of relying only on prompts

The goal is to make agents safer and more deterministic inside real developer environments.

I’m now open-sourcing it and looking for people who use Claude Code, Codex, Cursor, etc. to try breaking it on real workflows.

Feedback, criticism, and attack ideas are very welcome.

link to PyPI in the comments

I think there’s a real problem with AI coding agents getting too much trust once they run from the terminal.

Even if you give them clear instructions, MCP tools, or read-only access, they can still sometimes reach things you didn’t really mean to expose — like .env files, production keys, internal URLs, or commands that are technically available but shouldn’t be used.

My current thinking is that the solution shouldn’t only be “better prompting”.
There needs to be some hard boundary at the shell/environment level:

hide or replace sensitive env values
separate dev keys from production keys
block risky commands before they run
control which domains/tools the agent can access

Curious if other people here ran into this problem too.

Open-sourcing a shell-level security layer for AI agents

How do you stop terminal AI agents from reading .env or touching prod?