
Learning to write AI harness old fashioned way. Need help with attention drift and ignoring tool call results!
I've been writing a no-compile no-dependencies node.js based AI Harness for llama.cpp as a learning exercise and can really use some help. I'm basing my code off https://github.com/av/mi and https://pi.dev/ with really basic agentic loops. It basically loop until there are no more tool calls being made then returns the control to the user prompt.
My biggest problems are
- often times the LLM will ignore the tool call and the results and call the same tools again.
- or worse, sometimes it'll drift it's attention to answer a previously answered question and tries to work from there instead of the latest tool call or continue its plan.
I'm using a q4 quant of qwen3.6 27b. I don't experience this problem when I run the same model under pi. I've looked at pi's agentic loop implementation and there doesn't seem to be any special sauce.
I added reminder messages after tool calls to remind it to review them before moving on and it helps a bit, but I would like to know if anyone has experienced the same problem in their own AI harness development and how do you address it?
So far the reminder messages I've implemented kinna work, but it feels like band-aids than real cures.
Edit: add bare minimal source code.
tools/bash.mjs
if you have node.js installed 'node coffee.mjs' will run it. no dependencies. just make sure llama-server is running. all config information are stored as variables at the top of coffee.mjs. Very basic stuff, but should be very human readable code.
I have more tools and skills implemented, but this is the bare minimum that forms a basic AI coding agent/harness. Like I said, it's a learning project, not competing for anything. I've been using it as daily driver tho.
Oh, and if you have free AI resource, feel free to have it scan the code to see if it can help answer the question. thank you!