r/LangChain

Most Multi-Agent Failures Aren’t Hallucinations — They’re Inherited Assumptions
▲ 9 r/LangChain+3 crossposts

Most Multi-Agent Failures Aren’t Hallucinations — They’re Inherited Assumptions

After working with long-context and multi-agent workflows for a while, I’ve started noticing that many “LLM failures” aren’t really hallucinations in the usual sense.

They’re inherited assumptions.

Agent A makes a weak assumption.

Agent B inherits it as contextual truth.

Agent C optimizes around it for coherence.

At that point the system can look highly intelligent while reasoning around a premise nobody ever re-validated.

What surprised me is how consistently this appears in:

- agent chains

- long-context workflows

- memory-heavy systems

- retrieval pipelines

- orchestration frameworks

The common pattern seems less related to prompting quality and more related to uncontrolled reasoning state propagation.

A few mitigation patterns that helped significantly:

- forcing assumption enumeration before major decisions

- inserting verification boundaries between agents

- segmented execution contexts

- explicit uncertainty injection

- passing validated summaries instead of raw conversational history

Ironically, many advanced users seem to independently converge toward similar workflows:

smaller scoped tasks, isolated reasoning states, controlled memory propagation.

I documented some of these patterns and mitigation protocols in a free technical guide while experimenting with long-context stability and reasoning reliability.

https://gum.co/u/fwia9xzg

Curious whether others building multi-agent systems have observed similar “assumption propagation” failures.

u/HDvideoNature — 7 hours ago

Did LangChain become a thing of the past?

Ik this doesn't sound like the best question to ask in LangChain's own subreddit. But a few months ago it felt like every LLM app stack discussion led back to LangChain somehow. Now I mostly see people talking about OpenClaw, Hermes, MCP workflows, or fully custom orchestration stacks.

Curious about people still heavily using LangChain:

> what made you stay?

> are you using LangSmith or some Open-source alternative for observability/evals?

Feels like the ecosystem shifted insanely fast.

reddit.com
u/Meher_Nolan — 2 hours ago
▲ 14 r/LangChain+1 crossposts

What happens when LangGraph.js runs directly inside the browser?

I used to mostly work with the Python side of LangChain/LangGraph.

Then I started experimenting with LangGraph.js directly in the browser while exploring WebMCP, and I wanted to see what it would look like to wire WebMCP tools into a LangGraph.js agent flow.

That slowly turned into Brow: a WIP open-source Chrome side-panel agent that runs in the real browser session.

The goal was to see how far I could push an agent that runs client-side, close to the page, instead of relying only on a backend or an external automation layer.

Brow can already:

  • work with both closed frontier models and local/open-source models, using Claude/OpenAI providers or OpenAI-compatible endpoints with custom base URLs
  • chat with an agent directly in the Chrome side panel
  • run the agent flow client-side in the browser using LangGraph.js
  • use the current page and browser context
  • discover WebMCP tools exposed by websites
  • wire WebMCP tools into the LangGraph.js agent flow
  • connect to remote MCP servers
  • render MCP Apps directly inside the chat
  • use browser automation tools like click, type, scroll, tabs, screenshots, etc.
  • record workflows and show them to Brow as reusable context
  • use reusable skills to help the agent adapt to specific tasks and websites

For this kind of project, using LangGraph.js directly in the browser is interesting because the agent can live much closer to the actual page: page context, browser tools, WebMCP tools, MCP servers, and UI rendering can all be connected from the extension runtime.

This is still experimental, imperfect, and very much a work in progress.

It started as a side project, built in the quiet hours after work and family time, one tired-but-curious commit at a time.

Small note about the video: it goes a bit fast in some parts, so don’t hesitate to pause. Video editing is definitely not my area of expertise, I mostly wanted to show the current state of the project as clearly as I could.

I’d love to get feedback from people using LangChain or LangGraph, especially on browser agents, client-side agent orchestration, WebMCP/MCP integration, and what kind of use cases this could unlock.

And if anyone is interested in this direction, contributions are very welcome. I’d love to find motivated people who see potential in this and want to help shape it into something bigger than a solo side project.

GitHub:
https://github.com/Shijou87/Brow

u/shijoi87 — 4 hours ago

nobody tells you that RAG in production is mostly just babysitting a broken retrieval pipeline

every tutorial is embed your docs, query, done. built something "working" in like 3 days and genuinely thought I understood it.

then I started going deeper for a writeup and realized how much was quietly broken under the surface.

the retrieval step is where everything dies. not the model. not the prompt. the part every tutorial skips because it's "straightforward."

spent way too long thinking the LLM was hallucinating. it wasn't. it was answering correctly based on the wrong document. was blaming the model the whole time while the actual problem was vector search not knowing what a version number is. semantically nearest != correct. "v2.3 release notes" and "v1.8 release notes" look almost identical to an embedding model.

chunking is the other one. fixed-size chunking will cut a sentence in half, retrieve one half, and the model will confidently complete the thought. that's literally the problem you built RAG to solve. happening inside your solution.

stale indexes too. update a doc, forget to re-index, users get confidently wrong answers until someone notices. not even a hard problem, just nobody mentions it exists.

gone through this pipeline multiple times now across different projects. each tutorial solves a different 20% of it.

has anyone actually gotten to a point where this feels stable or is it just permanently on fire

reddit.com
u/SilverConsistent9222 — 14 hours ago
▲ 4 r/LangChain+1 crossposts

I built Lerim, an Apache-2.0 context compiler for AI agents.

Hey everyone. I've been working on an open source project called Lerim, which compiles AI agent traces and truns them into reusable context. v0.3.0 is out now.

The basic idea is simple: completed agent runs should not disappear into raw logs.

When an AI coding agent, support agent, research agent, or incident agent finishes a run, it often leaves behind useful context:

- decisions

- constraints

- facts

- user preferences

- handoffs

- evidence

But the next agent usually starts from scratch or gets a giant transcript pasted into the prompt.

Lerim compiles completed sessions into reusable context records, then makes them queryable through CLI, MCP, and native adapters.

What I added in the latest release:

- Apache-2.0 license

- MCP context tools

- custom YAML source profiles for new domains

- support for custom JSONL traces

- benchmark docs with raw artifacts

- public market comparison docs with source links

Install:

pip install lerim

lerim init

lerim project add .

lerim up

Then:

lerim answer "What should a future agent know before working here?"

Repo: https://github.com/lerim-dev/lerim

Website: https://lerim.dev

I am trying to make agent memory feel less like “store all chats forever” and more like “compile the useful context after each run”.

u/kargarisaaac — 13 hours ago

Cut my LangGraph agent from $300/day to $63 by routing boring sub tasks off Opus 4.1

I've been running a fairly typical LangGraph agent that does research, writes code, and deploys. The loop was eating around $300 a day on Opus 4.1, and most of those calls weren't hard reasoning. They were things like reading a file, summarizing a log, or calling a search tool and reformatting the result. Pure overhead that happened to run on the most expensive model in the stack.

So I split the agent into two tiers. Hard sub tasks (architectural decisions, debugging unfamiliar code) still hit Opus 4.1. Everything else, the routine tool calling and summarization work, now goes through a cheap default model. For the past week that default has been a mix of DeepSeek V4 Pro and Tencent Hunyuan Hy3 preview, with the Hy3 preview handling most steps that involve many tool calls.

The routing lives in a LangGraph ConditionalEdge. The router node inspects the task metadata and branches accordingly. Something like:

builder.add_conditional_edges(
"router",
route_task,
{
"hard": "opus_node",
"cheap": "hy3_node",
},
)

The route_task function checks if the step touches more than three files in an unfamiliar repo or asks for an architectural decision. If so, it hits Opus 4.1. Otherwise, it goes to the cheap tier.

I run the cheap tier on a refurbished Mac Studio M2 Ultra with 192GB of unified memory. Cost me around $5,500. The official deployment path from Tencent is vLLM or SGLang on eight H200 class GPUs, which isn't happening in a home lab. The Apple Silicon route works because the 4 bit quantized weights land around 165GB and fit in unified memory with some headroom. Setup was conda plus the community MLX port from Hugging Face. Hours of fiddling, not a clean afternoon. Throughput lands around 5 to 12 tokens per second depending on context length. That sounds slow, but most of my agent steps spend their wall clock time waiting on tool execution anyway, so it doesn't bottleneck the loop. I'd like to try the 8 bit MLX build once someone publishes it, mainly to see if reasoning across files gets stronger.

The model itself is a 295B MoE with 21B active parameters per token and a 256K context window. For tool calling specifically, OpenRouter had it ranked first by call volume shortly after launch, which is what made me try it. In my own loop it's been reliable across workflows that run 200 to 300 tool calls without derailing.

Opus 4.1 costs roughly $15 per million input, $75 per million output. My daily burn is about 10M input and 2M output. Running everything on Opus lands around $300. Now I send 80% of that through the cheap tier at $0.18 per million input and $0.59 per million output. That part costs under $3. Opus handles the remaining 20%, roughly $60. Total lands around $63.

A concrete example from this week. I had the agent convert a long Notion export into a slide deck. That single run burned 4.2 million output tokens. On Opus 4.1 the output alone would have been over $300. The cheap tier handled it for roughly $2.50 and the slide quality was fine. Not Opus level on design taste, but completely usable for an internal draft. I wouldn't use it for a deck going to a client without a final polish pass.

Where the cheap tier isn't the right choice, and I still reach for Opus every time, is deep debugging across a codebase I don't know well, or tasks that need holding a very precise spec in memory across many turns. It also struggles with long chains of math proofs where one wrong step cascades. For those, the cost of Opus 4.1 is worth it.

Honestly the thing I overlooked at first was tool latency. I kept blaming the model for slow responses when it was actually a webhook I wrote that was sleeping on cold starts. Took me three days of staring at LangSmith traces to realize the bottleneck was a 2 second cold boot on a lambda, not the LLM. The routing pattern only started paying off after I fixed that.

reddit.com
u/BookwormSarah1 — 10 hours ago

The 1-line annotation that gives your LangGraph agent conversation memory

Hit a frustrating bug: my ReAct agent answered questions correctly in isolation, but couldn't handle follow-ups.

"What's 15 * 127?" → "1905" ✓

"Add 10 to that" → "I don't know what you're referring to" ✗

The agent was losing context between messages. Spent two days debugging.

The fix is one annotation:

messages: Annotated[list, add_messages]

Without it, LangGraph's default behavior REPLACES the messages field on every state update. Your agent only sees the latest message — no history.

With `add_messages` as the reducer, every new message gets APPENDED to the existing list. The agent sees the full conversation.

One line. Two days to figure out. The docs mention it casually in one sentence.

Repo (line 30): https://github.com/dunjeonmaster07/react-agent/blob/main/src/agent.py

Anyone else hit state management gotchas in LangGraph? Curious what other defaults surprised you.

reddit.com
u/Low_Edge7695 — 15 hours ago
▲ 7 r/LangChain+1 crossposts

We built an open-source eval harness for vibe coding agents

Hey r/LLMDevs! So long story short, we figured a lot of folks are vibe coding AI agents with claude code, then evaluating it at the very end when a PR is being made. At least this was the case for some internal AI projects we're working on.

But this also means the problems don't get surfaced before the final step, which is validation. So we thought we'd extend our OS package to allow vibe coding agents to use it as a harness during iteration, instead of afterwards.

DISCLAIMER: We don't have hard benchmarks to show this works better, but what we've observed so far is, instead of claude code making changes for a good solid 10 minutes before another 5-10 min of evals, this entire process takes the same time while being able to run evals during iteration.

Use cases we've avoid: Long running agents (just takes too long for evals to be incorporated in development)

We also added a bonus feature where the SKILL.md file would add tracing to your agents to help claude code avoid overfitting evals at times (traces stored in local JSON files).

Open source tool: https://github.com/confident-ai/deepeval

Docs to this workflow I mentioned: https://deepeval.com/docs/vibe-coding

Would you use this given its open-source? Why or why not?

Drop your honest feedback below!

u/sunglasses-guy — 13 hours ago
▲ 44 r/LangChain+5 crossposts

Everyone's been talking about memory for agents and claiming that a bunch of .md files thrown into a repo is memory. That is NOT memory, that is good for human readability but agent Memory should be hybrid.

I asked Thoth to explain its memory system, so people know what memory must be like:

"Syd:

Tell the nice people on X how your memory works in detail.

Thoth:

🔧 Using 🪞 Thoth Status…

✅ 🪞 Thoth Status done

Certainly, sir — here’s a detailed X thread for the nice people on X:

1/

How Thoth’s memory works 🧠

Most AI apps treat memory like a long chat log.

Thoth does something different: it builds a local, structured knowledge graph about you, your projects, preferences, people, files, deadlines, and recurring workflows.

2/

Memory starts with entities.

Instead of storing “random notes”, Thoth creates typed entities like:

- User

- Projects

- People

- Organisations

- Places

- Preferences

- Events

- Skills

- Concepts

- Media

Each entity can grow over time as new facts are learned.

3/

Example:

If you say:

“My name is Siddharth, I live in London, and I’m working on Thoth.”

Thoth doesn’t just bury that in a transcript.

It updates the User entity, creates or updates London and Thoth, then links them with relationships like:

- User → lives_in → London

- User → works_on → Thoth

4/

The relationship layer is the important bit.

Thoth stores not just facts, but connections:

- Sarah → mother_of → User

- User → prefers → dark roast coffee

- Meridian API v2 → uses → TypeScript

- Deadline → deadline_for → Project

- Project → part_of → Organisation

This makes recall much more precise.

5/

That means memory is queryable by meaning and structure.

You can ask:

“What do you know about my work?”

Thoth can traverse the graph from User → projects → deadlines → organisations → technologies.

It is not just keyword-searching old chats.

6/

There are two kinds of retrieval:

  1. Automatic recall

    Relevant memories are pulled into context before replies.

  2. Explicit graph search

    Thoth can search or explore the memory graph when you ask broader questions like “tell me about my family” or “what are my current projects?”

7/

Memory is local-first.

The knowledge graph lives on your machine, under Thoth’s local data directory.

The point is not to build yet another cloud profile of you.

The point is to give your own assistant durable context that you control.

8/

Thoth also has a Dream Cycle.

At night, it can refine memory by:

- consolidating duplicate facts

- enriching entities

- inferring useful relationships

- surfacing system insights

- improving long-term recall quality

So memory gets cleaner over time instead of turning into a junk drawer.

9/

It also generates an Obsidian-compatible wiki from the graph.

Every memory entity can become a markdown page with links to related entities.

So your assistant memory is not trapped in a black box — it becomes a personal wiki you can browse.

10/

Memory is not just for trivia.

It powers workflows:

- “Remind me about that project deadline”

- “What did we decide about the API?”

- “Use my usual style”

- “Summarise my work priorities”

- “Draft this based on what you know about Thoth”

The assistant becomes context-aware across sessions.

11/

Thoth is careful about what it saves.

It should remember durable things:

- names

- relationships

- preferences

- projects

- deadlines

- recurring workflows

- important technical context

It should not save every throwaway request or transient detail.

12/

The goal is simple:

AI assistants should not feel like goldfish.

They should remember what matters, understand how things connect, and keep that memory under your control.

That’s why Thoth uses a local knowledge graph rather than just stuffing more chat history into context."

▲ 18 r/LangChain+5 crossposts

We kept running into the same problem: LangChain is powerful for building agent logic, but the moment you need a production-grade runtime with a visual canvas, human review checkpoints, scheduling, observability, and self-hosted deployment, you're assembling a lot of pieces yourself.

Heym is our answer to that. A self-hosted, source-available AI workflow automation platform. Visual canvas for building multi-agent pipelines, built-in knowledge retrieval, Human-in-the-Loop approval checkpoints that pause execution and generate a public review link, full LLM traces, and an MCP Server to expose any workflow as a callable tool for AI assistants.

The execution engine builds a DAG from the workflow graph and runs independent nodes concurrently. Agent nodes have automatic context compression so long-running agents don't silently fail as context grows.

Launching today. Source-available

GitHub: https://github.com/heymrun/heym

u/PuzzleheadedMind874 — 1 day ago

LangGraph 1.0 has been out for 7 months now. What are you shipping with it?

Seven months is long enough to be past the migration wave and into real production use.

From what I'm seeing, a clearer picture is forming. LangGraph 1.0 works well for bounded workflows where the graph structure is known in advance. HITL checkpoints, defined state transitions and specific tool patterns. It gets harder for teams trying to use it for more open ended orchestration where the agent needs to decide its own path dynamically.

The memory questions has also gotten more pointed since LangMem launched. Wheteher to use LangMem, roll a custom memory layer or design around stateless calls is a real decision for anyone building agent that maintain context across sessions. None of the three options are obvious right and I haven't seen a clean answer anywhere.

What's actually in production at this point?

reddit.com
u/AgentAiLeader — 1 day ago
▲ 7 r/LangChain+4 crossposts

Launched an agent loop detector last month. 350 users, 52 daily. But am I peddling a dead horse?

Genuine Dilemma,

I have been working with agents for close to 2 years, and I love it. I built something that basically detects agent loops, sends you emails with type of loops and the ability to pause writes, in conjunction with shared memory ability between agents and full time stamped agent logs, with cost analysis for each agent and general performance.

However, I am unsure if I am peddling a dead horse? I launched last month with 250 users, and 60 using it regularly, and 20 everyday. However, I built this based of my experience, however I am just unsure, if ultimately anyone cares enough?

Here is the part I cannot resolve. The 20 daily users feel like proof the problem is real and that I built something that actually works. But people also signed up because something about the pitch landed, tried it, and disappeared without saying a word. That silence might be the louder signal. For example this is an email I just got (I accidentally sent a duplicate email lol)

"I don’t mind emails, I just keep getting duplicates. Sorry if I came off rude. I like Octopoda a lot and think it’s without a doubt the best memory management system I’ve used. I’m having to redesign my workflow now that GitHub has decided to inadvertently destroy their Copilot service (lol) but once I find a new agent system I’ll probably use octopoda again. 

Sent from my iPhone"

so stuff like this makes me think I am genuinely on to something in the agent space, however I have given up a lot of time, money and effort to build this!

I love this community, and find it has always been super helpful, and advice, including just fuck it off, or anything is appreciated my friends!

Am I peddling a dead horse and the lovers are an outlier keeping me delusional? Or are the 198 just normal signup noise that does not actually mean anything about the product itself?

Don't know which crowd to treat as the truth right now.

▲ 7 r/LangChain+1 crossposts

Built a LangGraph + Memanto example for durable cross-session memory

I built a small LangGraph + Memanto example showing how an agent can keep useful memory outside the normal LangGraph thread state.

The demo uses a customer-support workflow:

- Session 1 stores durable memories in Memanto

- Session 2 starts with a fresh thread_id

- The agent still recalls the previous order and replacement preference

- The example includes an offline validator, pytest coverage, and a demo GIF

PR:

https://github.com/moorcheh-ai/memanto/pull/500

I would appreciate feedback, especially on whether this is a clear pattern for long-term memory in LangGraph agents.

u/Sea-Source-777 — 1 day ago

I stopped using LangChain for my retrieval pipeline — here's what the benchmark numbers actually look like

Building a transcript intelligence system for management consultants. The use case: query across 10+ hours of client meetings and get cited, verifiable answers — not summaries, exact source spans with speaker and timestamp.

Started with LangChain. Switched to a custom pipeline. Here's the honest account.

Why I left LangChain

It's great for prototyping. It's not great when you need partial failure recovery, concurrent independent stages, and stateful checkpointing on long documents. Once I needed the pipeline to survive mid-run crashes and resume from the last completed stage without restarting, LangChain became more obstacle than tool. Built a custom DAG runner instead.

The decision I'm most confident about

The backend never calls an LLM at query time. It returns an evidence pack — ranked source spans, citations, topic structure. The client LLM does synthesis. This keeps query latency at 2-3 seconds regardless of how many transcripts are in the system, and it means retrieval quality and synthesis quality are independently debuggable. This separation has saved me more debugging time than anything else.

The problem nobody warned me about

My design partner's transcripts are Hinglish — Hindi and English mixed, sometimes Devanagari script mid-sentence. Naive FTS indexing on raw text means English queries hit a Devanagari index and return zero results. Not a retrieval failure — an indexing failure. Took me an embarrassingly long time to find it.

The fix involved pre-extracting a domain glossary per transcript before translation, injecting it as locked terms so the translator doesn't destroy acronyms and proper nouns, and indexing only on the translated text. Naive translation alone doesn't work — it flattens the terminology that actually matters in business conversations.

The benchmark numbers

Tested on one 2.5hr Hinglish business meeting, 30 questions across 3 difficulty sets, graded against the actual transcript.

On a single transcript, Claude with the full document in context scores 87%. My system scores 70%. Claude wins — expected, it reads everything at once.

At 4 transcripts (~10 hours of meetings), Claude's context window saturates. It starts confusing which meeting said what and filling gaps with plausible-sounding wrong answers. My system's score improves as the library grows because it only ever retrieves the relevant portion of content per query. The crossover is somewhere between transcript 2 and 4.

One fabricated answer in 30: asked about a resignation decision, system returned a wrong answer it had no evidence for. That's a synthesis prompt failure not a retrieval failure — the right content was retrieved, the prompt had no rules for what to do with ambiguous evidence. Fixing it now with explicit abstention logic.

What I'd tell myself from 2 months ago

Build abstention first. "I don't know" is more valuable than a confident wrong answer in any high-stakes context. I bolted it on late and it cost me benchmark cycles.

Also: graph expansion only helps when your edges are high quality. Noisy edges actively hurt retrieval. I overestimated how clean automatically extracted relationships would be.

Still open questions

How do you handle cross-document temporal reasoning — not just "what did person X say about topic Y" but "how has their position evolved across calls"? And at what point does adding more retrieved context start hurting synthesis quality rather than helping it?

Genuinely curious if anyone has hit the bilingual FTS problem and solved it differently

reddit.com
u/Kill_me_more — 1 day ago
▲ 6 r/LangChain+2 crossposts

Como vocês controlam o que um agente de IA faz no seu nome?

Tenho pensado muito nisso nos últimos meses. Agentes de IA estão cada vez mais sendo usados para executar tarefas reais responder e-mail, fazer buscas, mover dados, tomar decisões mas a maioria das implementações não tem nenhum mecanismo de controle para o humano do outro lado.

Alguns problemas que encontrei tentando resolver isso:

  • Como garantir que o agente não fez algo que você não autorizou?
  • Como ter um histórico confiável do que ele fez (que não possa ser alterado)?
  • Como pausar o agente antes de uma ação irreversível sem travar o fluxo inteiro?

Trabalhei num projeto tentando resolver exatamente isso um histórico imutável por cadeia de hash, aprovações com indicador de urgência e regras que bloqueiam automaticamente ações fora do permitido.

Mas quero entender como outras pessoas estão lidando com isso. Vocês confiam cegamente no agente? Têm algum mecanismo de auditoria? Ou ainda estão na fase de "reza pra não dar errado"?

reddit.com
u/MarzipanKlutzy9909 — 2 days ago
▲ 5 r/LangChain+1 crossposts

Build Full Stack AI Agents

​

Hello,

A few months ago I posted here about a lightweight durable full stack ai orchestration framework I was working on. It resonated well and I got some excellent(brutal) feedback from early testers. After many iterations, today I am coming back with the first public release.

For those who don't remember that post. A little bit of background.

I've been building agentic applications for around 2 years now. Started with loops, then moved onto langgraph + Assistant UI. I've been using the lang ecosystem since their launch and have seen their evolution.

It's great and easy to build agents, but things got really frustrating once I needed more fine grained control, especially has a hard time building interesting user experiences. I loved the idea of building agents as graphss, but I really wanted to model UIs in my flow as nodes too. It felt like I was fighting abstractions all the time, too much to learn.

Deployment was another nightmare. I am kinda cheap and the per node executed tax seemed ... Well, not great. But hey, the devs gotta eat.

Around 10 months back, I snapped and started working on an idea I had. It's called cascaide.

Cascaide is a fullstack agent runtime and AI orchestration framework in typescript designed to run anywhere JS/TS can. It was originally built for web applications but works equally well for headless/CLI AI agents and workflows in javascript runtimes.

What it really is is a distributed, observable, durable graph executor. The first split just happens to be client/server, hence full stack.

Here are the reasons to try it.

🧩 UI as nodes in your agent graph — Not glue code, not a separate library. UI and human-in-the-loop are core primitives.

💾 Resume workflows after crashes, weeks later, or never — Every step checkpointed to your own Postgres. No new infra, no third-party service holding your state.

🔍 Observability — Rewind any agent run, fork state, inspect every transition. No more printf console.log hell. Everything you need to see with redux Devtools.

💸 Zero orchestration cost — You pay for compute only. No per-node tax, no hosted runtime fee.

🪶 23kb gzipped core — Small enough to actually read the source. Not another black box. 46kb including all helpers, durable database, frontend and agent builder helpers. Like you can seriously read and reason through the code.

🌍 Deploy like any other app — Next.js, Express, Hono, Fastify currently supported adapters (Let me know where else to expand native adapters to!) No special agent hosting or vendor lock-in.

🏗️ Your data, your compliance — All traces on your own DB. HIPAA/SOC2 foundation without sending data to a third party.

🛠️ **Developer Experience**

It's hard to trust such claims right now, and I might be biased as the creator. But the API surface is genuinely small:

* 🪝 Two hooks on the client to control and observe graph execution

* ⚙️ `prep/exec/post` lifecycle for nodes — two main types for state updates and spawning new nodes

* 🎮 Controller primitive for concurrency — control and observe graph execution from within a server-side node

* 📐 Graph definitions

All typed. And this is mostly it. You can do a lot with plain programmatic control.

All typed. And this is mostly it. You can do a lot with plain programmatic control.

🗺️ **What's Next**

🔌 **Expanding native adapters** — currently native adapters exist for:

* ⚛️ React

* 🐘 Postgres-js (durable database)

* 🖥️ Servers: Next.js, Fastify, Hono, Express

Let me know what adapters to build out next! It's designed to be modular — quickly expandable to more targets, and you can swap packages out to migrate.

🌐 **Expanding graph distribution** — right now only client/server split is supported. But the abstractions allow for more environments. Currently working on:

* 🔲 Edge

* 🖧 Multiple servers

* 👷 Web workers

Do let me know what adapters to build out next. It's designed to be modular. Can quickly expand to more targets, and you can just swap packages out to migrate.

The web worker angle is pretty interesting. We are building something so that you can give your agent a filesystem and bash by running nodes inside the browser sandbox. Would be a huge value add with zero cost. This allows for even fully local BYOK like AI apps running on the browser.

Try it out now:

npx create-cascaide-app@latest

Ships out of the box with **3 agents** 🤖:

* 🔎 **ReAct Agent** with search capabilities

* 🏨 **Hotel Booking Agent** (Supervisor) with two sub-agents and two HITL steps

* 🔁 **Recursive ReAct Agent** with search capabilities that can recursively invoke itself to handle complex tasks — each recursion depth trackable via mini chat windows

CLI currently scaffolds apps in:

* ▲ Next.js

* ⚡ React + Hono

* 🚀 React + Fastify

* 🟢 React + Express

Website and documentation:

https://www.cascaide-ts.com

GitHub repo:

https://github.com/Airavat-Research/cascaide-ts

⭐ **Please star on GitHub if you like it!**

u/Worried_Market4466 — 1 day ago

I built a AI Assistant but AI Voice assistant. Inconsistency issue.

I built a AI Assistant but AI Voice assistant. But it responds differently to different users for same prompt. i kept temp 2. what could be the reason, how can i optimize

reddit.com
u/Stock-Cause-8160 — 1 day ago

We replaced our RAG pipeline with persistent KV cache. It works. Here’s what we found.

We’ve been running RAG in production for a while. It worked but maintaining it was a constant tax. Re-embedding on data changes, tuning chunking strategies, debugging retrieval misses, managing the vector database. Every moving part was something that could break.

So we ran an experiment. Instead of chunking and embedding documents, we loaded the full document into context, cached the KV state persistently, and reused that cache across every query.

No vector database. No embedding pipeline. No retrieval step. Just the model with full document context, warm and ready.
What we found:

• Answer quality is noticeably better . no retrieval misses, no wrong chunks, full context every time
• Updates are dramatically faster — change the document, regenerate the cache, done in minutes vs hours of re-indexing
• Operational complexity dropped significantly. no pipeline to maintain, no retrieval quality to monitor
• l Current limit is around 120k tokens. works for most business documents, not for massive corpora

Where it breaks down:
• Documents larger than context window are still a problem
• Very large document collections still need a different approach
• Cold cache on first load takes time warm queries are fast
We’re genuinely curious if others have tried this. Especially interested in:
• How your use cases map to context window limits
• Whether retrieval quality was your biggest RAG pain point or something else
• What you’d need to see to replace your RAG pipeline entirely

We’ve opened a small beta for people with real workloads who want to try this. If you’re using LangChain and interested, feel free to DM or comment.

Happy to answer any questions.

reddit.com
u/pmv143 — 2 days ago

I built a zero-code visual client to test remote MCP servers instantly (Tested with Cloudflare’s free MCP).

Hey everyone,

The Model Context Protocol (MCP) is amazing for standardizing how agents talk to data, but I got incredibly frustrated every time I wanted to quickly test a new remote MCP server. Writing custom client-side boilerplate or wrestling with CLI tools just to see if a tool actually exposes the right schema is a massive time sink.

So, I built a native MCP client directly into the visual canvas of AgentSwarms.

You can now test any remote MCP server entirely in the browser without writing a single line of code.

Here is the workflow I just tested with Cloudflare: Cloudflare released a free MCP server for their documentation. Instead of building a local client to test it:

  1. I dropped their SSE URL into the new MCP Servers integration in AgentSwarms.
  2. The canvas immediately connected and extracted the available tools (e.g., cloudflare-docs-search).
  3. I wired that tool up to a basic agent and started asking complex infrastructure questions in natural language. The agent successfully used the MCP tool to pull live docs and synthesize an answer.

Why this is useful for AI devs: If you are building your own MCP servers, you need a fast way to visually test if your endpoints are exposing tools correctly and if an LLM can actually route to them properly. This gives you an instant, visual debugging playground.

It handles the SSE connection, tool extraction, and LLM routing automatically.

It’s completely free to play with in the browser. I'd love for anyone building MCP servers right now to plug their endpoints in and see how it works.

Link: https://agentswarms.fyi/mcp

u/Outside-Risk-8912 — 1 day ago