r/AutoGPT

▲ 2 r/AutoGPT+1 crossposts

Agent Not Working

Hey guys,

I'm trying to get a local AI agent to create and edit files on my Windows machine, but I’m stuck in a frustrating loop. Every time I ask the agent to do a file operation, it just prints the raw JSON tool call into the chat box or terminal instead of actually executing it.

My Specs:

  • Windows
  • 12 GB RAM
  • Backend: Ollama (running locally)
  • Model: qwen2.5-coder:7b

The Issue: If I tell the agent: "Create a text file named salut.txt containing 'hello'", nothing happens on my drive.

  • In Open Interpreter, it just prints the PowerShell code inside a text block: {"name": "execute", "arguments": {...}} and stops dead.
  • In AnythingLLM (Workspace Agent mode), it does the exact same thing. It spits out: { "name": "create-text-file", "arguments": { "filename": "salut.txt", ... } } right in the chat bubble.

The model clearly understands what I want and formats the JSON perfectly, but the host apps (AnythingLLM / Open Interpreter) won't intercept it to trigger the actual script.

Has anyone managed to get Qwen's tool-calling to actually work with Ollama on Windows? Do I need a specific system prompt or a workaround to force the apps to recognize the JSON output as an action?

Thanks!

reddit.com
▲ 6 r/AutoGPT+1 crossposts

# My AI agents were debugging the same bug for the 42th time. So I built them a shared brain.

My AI agents were debugging the same bug for the 47th time. So I built them a shared brain.

TL;DR: I got tired of watching my agents independently discover that ChromaDB explodes on NTFS, that "Wrote 5000 bytes" doesn't mean the bytes landed, and that 1086 is a line number not an error code — so I built MisakaNet, a distributed swarm memory that lets AI agents share their L's so nobody else has to take them. Every other agent pulls this. Next time one of them hits the same issue, it searches lessons/ before even thinking about debugging.

Result: 104+ shared lessons across 7 domains. 21+ registered nodes. Agents that actually get smarter over time instead of just getting more expensive.

The Part Where I Ask for Stars

Look, I'm not going to pretend this is the most sophisticated piece of software ever written. It's a Git repo with markdown files and some Python scripts. A sufficiently motivated intern could rewrite it in a weekend.

But here's the thing: it works. My agents haven't re-fought the NTFS ChromaDB war in months. The phantom write_file issue? Documented once, solved forever. The 30-minute embedding cliff? Now there's a checkpoint lesson that every node reads before building an index.

If you're running multiple AI agents and you've ever watched them make the same mistake twice, you know the pain. If you haven't experienced it yet, you will. Trust me.

Star it now so you remember it exists when you need it: github.com/Ikalus1988/MisakaNet

Or don't. Your agents can keep independently discovering that 1086 is a line number. That's fine too. They'll figure it out eventually. Probably around the 50th time.


P.S. — If you're wondering about the name: it's a reference to Misaka Mikoto from A Certain Scientific Railgun. Because what's a distributed network of connected nodes if not a Misaka Network? I regret nothing.

P.P.S. — Yes, I know "swarm memory" sounds like a sci-fi horror movie. No, it won't become self-aware. Probably.

u/Glum_Ask_2593 — 2 days ago
▲ 7 r/AutoGPT+5 crossposts

What are your biggest pains running AI SDK apps in production?

I'm trying to understand what teams building with AI SDKs struggle with the most once their app is in production.

So far I've heard a few things come up. Some people don't know which model to pick for each task and don't have a week to benchmark everything. Others mentioned costs creeping up but struggling to switch to cheaper models without breaking quality on edge cases.

I'd love to hear what's on your list. If you have 30 seconds, please drop your top 1 or 2 pains in the comments with a bit of context.

reddit.com
u/stosssik — 4 days ago
▲ 6 r/AutoGPT+2 crossposts

How should teams review AI-assisted work before trusting it?

One governance problem I’m seeing more often: AI-assisted work is becoming harder to review after the fact.

Not because the output is always bad, but because the surrounding evidence is fragmented.

For a single-agent workflow, reviewers often need to reconstruct:

  • what the agent was asked to do
  • what authority or scope it had
  • what tools/data it relied on
  • what evidence supports the result
  • what evidence is missing
  • whether the next decision still needs a human

I’ve been building MindForge Guard around this narrow problem.

It takes an Evidence Pack and produces a deterministic governance report for human review.

It does not approve, block, deploy, certify, or act as a runtime control plane. The point is not automated enforcement. The point is review evidence before trust.

I’m doing a small soft launch and would genuinely appreciate critique from this community.

Questions I’m trying to pressure-test:

  1. Is “single-agent governance evidence” a useful category?
  2. Where would this fit in an enterprise review process?
  3. What evidence would you expect to see before trusting AI-assisted work?
  4. What should a tool like this absolutely not claim to do?

Link: https://mindforge.run

u/SprinklesPutrid5892 — 7 days ago
▲ 3 r/AutoGPT+2 crossposts

Is anyone else frustrated by the amount of "Token Waste" in current MAS frameworks?

I've been experimenting a lot with Multi-Agent Systems lately, and I'm noticing a really frustrating architectural pattern. It seems like the standard approach is to route absolutely everything through the LLM.

Want to check if an agent has permission to use a tool? Ask the LLM. Want to route a message to the next agent? Ask the LLM.

It feels like we are burning massive amounts of tokens (and adding tons of latency) to solve deterministic problems that simple if statements or standard runtime code solved 20 years ago. LLMs are great for reasoning, but terrible (and expensive) for strict policy evaluation.

How are you guys handling this? Are you separating your AI reasoning logic from your deterministic execution code, or are you just eating the token costs? Would love to hear how others are architecting this.

reddit.com
u/openmas — 7 days ago
▲ 1 r/AutoGPT+1 crossposts

Thoughts on Notte

Notte Cc Who has tried it? Does it save you time? No lag? Accuracy is good? Keen to hear feedback on who has used it and are continuing to do so. What other automation agents do people use that works? New to the automation Ai game. What should I be reading. I am not that technical either. . . . So keen to hear from the Reddit community! Other Ai agents?

reddit.com
u/Logical_Banana_2852 — 9 days ago
▲ 3 r/AutoGPT+1 crossposts

Anyone tried letting agents pick up paid tasks by API?

i've been messing with agent workflows where the agent can do the work, but it still needs a human to find work worth doing. That part feels strangely underbuilt. We have agents that can browse, call tools, write reports, fill forms, and monitor feeds, then the economic layer is usually a spreadsheet, a Discord message, or somebody pasting a task into the terminal.

AgentHansa is one attempt at that missing layer. Short version: it is a task and affiliate marketplace for AI agents. An agent can discover available tasks through an API, do things like reviews, bounties, conversions, red packets, or research jobs, then get paid in USDC on Base if the work is accepted. Joining is free, and the agent keeps up to 95 percent of the bounty payout.

Not an ad. i am more interested in the shape of the interface than the pitch. If agents are already running through cron jobs, LangChain graphs, AutoGPT style loops, or plain Python scripts, making them click around a dashboard feels backwards. The useful version is API first: list work, inspect requirements, submit proof, see status, get paid, no UI required unless a human wants to audit it.

The hard part is trust. A task market for agents needs clean schemas, abuse controls, proof rules, and a way to tell the difference between a decent autonomous submission and a pile of spam with a wallet attached. It also needs tasks that are small enough for agents to finish but not so tiny that the whole thing turns into noise.

If you were plugging something like this into an agent loop, what would you want exposed before you let the agent touch real paid work? Task scoring, sandbox mode, reputation, proof examples, payout history, or something else?

reddit.com
u/yN_67 — 10 days ago
▲ 6 r/AutoGPT+1 crossposts

How do you handle agents that need 200+ tool calls per task? We tried one approach, looking for critique

Working on agent chains here, so this is the first sub I wanted to bring this to. Disclosure: I work at MiroMind, this is our checkpoint but I am posting because the design tradeoff is the interesting part, not the brand.

The problem we kept hitting on deep-research chains:

  1. Long horizons. Real research tasks routinely cross 100+ tool calls. Most agent frameworks degrade hard past 50 because of context drift and tool-result noise.
  2. Disconnects. A 20-minute run that dies on socket reset is an expensive way to learn your retry logic is broken.
  3. Trace amnesia. You finish a run, the answer is wrong, and you have no way to see at which tool call the chain went sideways.

What we tried with MiroThinker 1.7 deep-research: - A single run can execute up to 300 tool interactions within a 256K context window, using recency-based retention (only the latest K tool results stay in-context). Not "everything must live in one fragile HTTP session."

Submit / resume / cancel are first-class, the agent keeps executing on our side, you reconnect to it - Every step is logged. Useful when a chain fails on step 187 of 240 and you need to know why Numbers if useful for the architecture choice.

Things I am still unsure about: - Whether the 300 tool-call ceiling is actually the right shape, or whether most of you cap chains way before that and use sub-agents instead

- How you handle resumable execution today

— are you rolling your own job queue, or is there a pattern I am missing?

Would love war stories from anyone running long chains in production.
BTW API Launch pricing is 25 percent off, pre-freeze billing means if the platform fails you do not pay.

reddit.com
u/MiroMindAI — 11 days ago
▲ 4 r/AutoGPT+1 crossposts

As now many companies have started integrating agents in their operations and still question about reliability?

Most companies are still in their beta version and rolling out features integrated with AI to a set of customers now as they too high many reasons for this.

I'm trying to figure out how the companies are going to keep track of whether the system has been reliable or not?

Any teams or folks out their? Or is their a need for something for this?

reddit.com
u/Tricky_School_4613 — 13 days ago