r/AiAgentts

I got tired of not knowing what my AI coding agent was actually doing — so I built a runtime transparency layer into it
▲ 3 r/AiAgentts+3 crossposts

I got tired of not knowing what my AI coding agent was actually doing — so I built a runtime transparency layer into it

Today it instruments CyxCode’s process and filesystem wrappers, records shell/file/network-style events with session

and prompt context, scores risky activity, writes local JSONL audit logs, and exposes dashboard/report APIs.

Destructive shell commands are blocked before spawn, and sensitive writes/risky actions are classified for policy

decisions.

None of the major coding CLIs have this. Not Claude Code, not Cursor, not Windsurf, not Aider. They're all trust-and-hope.

CyxCode is open source fork from opencode and the repo is live. If this is a problem you've thought about too, I'd really appreciate a star — it helps signal that runtime AI transparency is worth building properly.

https://github.com/code3hr/cyxcode

u/YoungCJ12 — 20 hours ago

Nobody tells you how hard it is to keep a long running AI agent actually alive

Two months of self-hosted hermes. The agent itself was genuinely good. That's not the part worth writing about.

A Docker container exits silently at 3am and you find out six hours later when you notice nothing ran. A dependency update breaks the environment config and your scheduled automations stop without any alert, just stop, and you find out two days later when you go to use them. SSL cert expires, Telegram integration goes dark for no obvious reason. Cron job loops one night, $55 added to the API bill before morning.

None of those are hermes problems. They're just what running a long running AI agent looks like when you don't have infrastructure monitoring. The guides don't mention that part.

After the second billing incident I moved to clawdi. Auto-restart on any container crash. API keys in Intel TDX hardware-encrypted storage that even the platform infrastructure can't access. Uptime dashboard visible without SSH. Haven't had an unplanned outage since.

A long running AI agent needs infrastructure monitoring, not just infrastructure. If you're not going to build that layer yourself, use something that already includes it.

reddit.com
u/ninjapapi — 1 day ago
▲ 53 r/AiAgentts+1 crossposts

DeepSeek R2 just went open-source and it's matching GPT-4o on 9 of 12 benchmarks — for literally $0 in API costs

The benchmark sheet dropped this morning and people are losing it in the ML community.

What DeepSeek R2 scores:
•MMLU: 90.8 (GPT-4o: 88.7)
•HumanEval coding: 93.2 — new open-source SOTA
•MATH reasoning: 88.9
•Runs on a single A100, fully local, zero API costs

Hugging Face hit 300k downloads in the first 6 hours. The open-source community is already fine-tuning it for medical, legal, and finance use cases.

The cost gap is now absurd: GPT-4o charges ~$0.015/1k tokens. DeepSeek local = $0.00. For high-volume use cases, this is a 50x cost reduction overnight.

The 'closed model moat' argument is officially dead. Every startup bleeding $40k/month on OpenAI has a real migration path now.

reddit.com
u/Ok-Drama-6800 — 6 days ago
▲ 7 r/AiAgentts+1 crossposts

The DeepSeek + Claude 4.7 combo is the most powerful $50/month AI stack I've ever built — full routing workflow inside

I've been testing every model combo for 3 months. This is the one that stuck.

The core insight: DeepSeek and Claude 4.7 are NOT competitors. They're complements.

DeepSeek dominates at:
→Code generation and debugging
→Math, logic, structured reasoning
→Data analysis and transformation
→Anything where raw accuracy beats tone

Claude 4.7 is unmatched at:
→Persuasive and creative writing
→Nuanced client-facing communication
→Long-form coherence and voice
→Anything where trust and tone matter

My LiteLLM router logic:
•Prompt contains 'code', 'debug', 'analyze', 'data' → DeepSeek
•Prompt contains 'write', 'email', 'copy', 'explain' → Claude 4.7
•Default fallback → Claude 4.7

Monthly cost: ~$47 Claude API + $0 DeepSeek (local via Ollama)
Equivalent GPT-4o stack:$380+/month

I used this exact setup to make $1,277 in my first week selling freelance AI services. Full story in my other post

reddit.com
u/Ok-Drama-6800 — 6 days ago
▲ 12 r/AiAgentts+7 crossposts

Three bots in a trenchcoat is not omnichannel

Self-serve is exciting. Genuinely. But if I am honest, it is not the most interesting thing about 13 May.

The most interesting thing is that we have been quietly running architecture that the rest of the industry is only just figuring out exists.

A competitor recently launched real-time SMS ingestion. The coverage was breathless. Everyone lost it. So innovative. Revolutionary. Game-changing.

Me? I looked at our codebase and thought: "SMS ingestion. Wow. That is so 2025."

Here is what we actually built, and have been running in production for the better part of a year.

Mid-voice-call, Elba texts a short URL to the caller. The caller fills out a form on their phone. The structured data comes back into the live call via RPC. The workflow receives clean JSON. The voice call never paused. The agent never lost session state. The caller submitted a form while still talking and the agent acted on it in the same conversational turn.

That is not SMS ingestion. That is a bidirectional channel bridge inside a single active session. Sending an SMS during a call is not new. Getting structured data back into the active session in real time without dropping state on either side - that is the part nobody else has shipped.

And it sits on top of something even more fundamental.

Most "omnichannel AI" are three bots in a trench coat. A voice agent, a WhatsApp bot, a webchat widget, all pointing at the same CRM row and calling it unified. Each with its own prompt, its own config, its own version history, its own failure modes.

Elba is one agent. One workflow. One memory layer. Voice, WhatsApp, SMS, email and webchat all running through the same execution engine. Not copies. Not synced versions. The same agent, same logic, same memory, regardless of which channel the conversation arrived on. Deployments are atomic - every channel switches to the new workflow version in the same transaction. No drift. No "did the WhatsApp bot get the update" incident. One audit trail.

When a regulated enterprise customer asks what exactly their AI told a customer across every channel and every session for the past six months, we have a single clean answer.

The competition is announcing SMS ingestion and calling it a breakthrough.

We are launching self-serve on 13 May and already cooking the next thing. We may have put it on hold until after the launch. Our tech never sleeps though.

If you want an agent that actually knows who it is talking to across every channel and every session: self-serve opens 13 May at www.kolsetu.com.

Full technical writeup: https://www.kolsetu.com/blog/the-architecture-nobody-else-built

u/EdikTheFurry — 9 days ago