u/Arindam_200 — reddlx

Every MCP server you add makes your agent slightly dumber. Here is what actually fixes it.

One thing I’ve started noticing with MCP-based agents is that performance degrades much earlier than most people expect, especially once the number of integrations becomes large.

Small setups work surprisingly well. A few integrations, a handful of tools, manageable schemas, and the agent behaves predictably. The problems usually begin once teams start connecting the systems they actually use in production. Slack, Gmail, GitHub, Linear, Notion, databases, deployment tooling, internal APIs, monitoring systems. The integration surface grows very quickly.

At that point, the issue stops being “model intelligence” and starts becoming a context management problem.

Most MCP servers expose many tools, and each tool brings descriptions, parameter schemas, examples, and edge cases into the prompt space. Individually this feels harmless, but collectively it creates a very noisy environment for the model to reason inside. The agent spends more effort understanding the tool ecosystem than solving the task itself.

You can partially reduce the problem with lazy loading or dynamic tool visibility, but those approaches still inherit the same scaling issue underneath. The total surface area keeps growing.

I recently came across this open-source project Corsair that takes a different approach, and I thought the design was genuinely interesting.

Instead of exposing hundreds of tools directly, it exposes four generic primitives:

setup and authentication
operation discovery
schema inspection
execution

The important detail is that schemas are fetched only when the agent decides it needs them. The model first discovers available operations, then inspects a specific schema on demand, and finally executes the workflow.

That keeps the tool surface effectively constant regardless of how many integrations exist underneath.

The design feels much closer to how humans interact with unfamiliar systems. You first discover what capabilities exist, then inspect the details you need, and only then perform the action. Most current MCP ecosystems invert this by front-loading the entire integration surface into context immediately.

I suspect a lot of current agent reliability issues are really interface design problems. As integration counts grow, the systems that scale will probably be the ones that minimize what the model has to hold in working memory at any given moment.

u/Arindam_200 — 10 days ago

▲ 4 r/LangChain+1 crossposts

We've been leaning into agents for a while now, tasks like PR drafts, code suggestions, are almost delegated to them. TBH, I agree, with this, Velocity went up.

Then one day production breaks. We trace it back to a change that bumped retry count from 2 to 5. Clean diff, tests passed, sailed through review.

What it didn't know was that we'd hit an almost identical failure 8 months ago and had quietly learned to never touch retry logic in that service without extra eyes on it.

That lesson lived in people's heads. Not in any doc, not in the codebase. The agent had no shot at knowing it.

Weirdly, the cleaner the PR looks, the faster it gets merged. A messy diff makes reviewers slow down and ask questions. A well-structured agent PR does the opposite; it reads as "already figured out." The risk is still there, just invisible now.

We're not going back. But I don't think we fully appreciated how much institutional memory was doing quietly in the background before we started moving this fast.

More of my thoughts here if curious.

u/Arindam_200 — 17 days ago

▲ 4 r/LLMDevs

I went down a bit of a rabbit hole on model security, and this article stuck with me.

The more I think about it, the more it feels like most of us are checking the wrong box and calling it done.

If a model is signed and has scan results attached, it feels solid. You can verify it hasn’t been tampered with. Everything looks clean in the registry. But that only tells you about the final artifact, not how it came to exist.

And that’s the part that’s weirdly invisible.

Take a simple case. You fine-tune a model using some base model and a dataset. The final model gets signed, passes checks, ships. At no point do you actually have a strong guarantee that the base model was what you thought it was, or that the dataset you used is the same one that got approved earlier. You’re trusting that nothing changed along the way.

There’s no real connection between the final model and its inputs. They just sort of… exist in the same place.

That’s what this article is calling out. The idea is pretty straightforward: treat the whole thing like a graph, not a single object. The model should carry proof of exactly what went into it, down to the digest level, and verification should walk that chain back through every input.

Not just “this model is signed,” but “this model was built from these exact things, and each of those passed the required checks.”

Which sounds obvious once you say it out loud, but I don’t think most pipelines actually do this today.

What surprised me is that we already have most of the building blocks. Attestations, SBOMs, registries, signatures. But they don’t really talk to each other in a way that enforces this end-to-end. So we end up with something that looks secure on the surface but doesn’t answer the deeper question.

It reminds me a bit of early container security, where people were scanning images but not really thinking about how those images were built.

u/Arindam_200 — 18 days ago

▲ 11 r/Claudeopus

While playing with Opus 4.7 over the last few days, I noticed that prompts were filling context much faster than I expected.

I also came across a few measurements from others testing it with real developer inputs like project instructions, git logs, stack traces, and long coding prompts.

https://preview.redd.it/mdwqp4ybhkwg1.png?width=1080&format=png&auto=webp&s=c69780f70c0d04de4972e871deac75a88c470e92

Anthropic mentions the updated tokenizer may produce around 1.0–1.35× more tokens compared to previous models.

But a lot of the real-world measurements seem closer to ~1.4–1.47× more tokens. Which becomes noticeable pretty quickly if you're running larger contexts.

That means:

context budgets disappear faster
long-running sessions accumulate tokens much quicker
effective cost per workflow goes up

Not necessarily a bad thing, though.

I mean, Tokenizer changes are usually made to improve how the model handles code, markdown, structured text, and other developer-heavy inputs. So there’s probably a capability tradeoff happening here.

I made a short video here walking through the measurements, the tokenizer changes, and what it means in practice, if you want to explore more

reddit.com

u/Arindam_200 — 1 month ago