r/devtools

▲ 14 r/devtools+8 crossposts

I built an open-source Agent Verifier for Claude Code, Cursor & other Coding Assistants that catches security issues, hallucinated tools, infinite loops and anti-patterns in Agent built using LangChain, LangGraph, and other frameworks. (free, open source, 100% local)

I've been using Claude Code for a few months and noticed AI agents consistently skip the same things: hardcoded secrets, unbounded retry loops, referencing tools that don't exist, and massive system prompts that blow context windows.

So I built Agent Verifier — an AI agent skill that acts as an automated reviewer which does more than just code review (check the repo for details - more to be added soon).

GitHub Repo: https://github.com/aurite-ai/agent-verifier

Note: Drop a ⭐ if you find it useful to get more updates as we add more features to this repo.

----

2 Steps to use it:

You install it once and say "verify agent" on any of your agent folder in claude code to get a structured report:

----

✅ 8 checks passed | ⚠️ 3 warnings | ❌ 2 issues

❌ Hardcoded API key at config.py:12 → Move to environment variable
❌ Hallucinated tool reference: execute_sql → Tool referenced but not defined
⚠️ Unbounded loop at agent/loop.py:45 → Add MAX_ITERATIONS constant

----

Install to your claude code:

npx skills add aurite-ai/agent-verifier -a claude-code

OR install for all coding agents:

npx skills add aurite-ai/agent-verifier --all

----

Happy to answer questions about how the agent-verifier works.

We have both:
- pattern-matched (reliable), and,
- heuristic (best-effort) tiers, and every finding is tagged so you know the confidence level.

----

Please share your feedback and would love contributors to expand the project!

u/Chance-Roll-2408 — 2 hours ago
▲ 5 r/devtools+2 crossposts

I made revera, a tool that scores NPM packages before you blindly install them

So I wondered sometimes, how little info we have when we install NPM packages.. so I built revera... its a npm package scorer, but on steroids. It uses a complex sophisticated algorithm (still not perfect, but near-perfect) that nails at ranking NPM packages.. it gives every package a score and the score is determined on criterias such as maintainability, trust, package releases, downloads, much more..

the audit command scans the working directory for

it has the following extra features:

  • logging in with github for higher rate limits
  • why command for explaining a certain package's score
  • doctor for checking if everything is working
  • caching system which lives for 24h on local machine
  • and a customizable config

It would mean the world to me if you all could try it out and give feedback (bad or good)!

github repo: https://github.com/aaravmaloo/revera

npm package page: https://www.npmjs.com/package/@aaravmaloo/revera

u/aaravmaloo — 7 hours ago
▲ 11 r/devtools+5 crossposts

Safer-dependencies: A tool for claude code to ensure dependencies used aren't vuln, don't use abandoned packages, implement cooldown to avoid supply chain attacks, etc...

I built safer-dependencies, a security layer for Claude Code that checks packages before AI coding assistants add them to a project. I originally built this for my own workflow, but I’m sharing it publicly in case it’s useful to others using Claude Code.

It runs dependency safety checks for things like known CVEs, typo-squatting, abandoned packages, stale releases, package age/cooldown windows, and PyPI hash-pin integrity.

It currently supports npm, PyPI, RubyGems, Maven, Go, and Rust. Open source to help others.

GitHub: https://github.com/robert-auger/safer-dependencies

u/SecTemplates — 18 hours ago
▲ 17 r/devtools+5 crossposts

Mouseless app for Mac OS

Your mouse is slowing you down.

Every time your hand leaves the keyboard, you lose focus.

I switched to a keyboard-only workflow on macOS, and it's surprisingly addictive. The app is made using Swift, and it is created with the help of AI.

Try it yourself:

👉 https://github.com/bhavesh164/mouseless

You'll wonder why you ever reached for a mouse.

Note: This app is made with the help of LLM (AI)

u/bhaveshverma164 — 1 day ago
▲ 326 r/devtools+69 crossposts

I built an open-source, self-hosted AI gateway: 237 providers (90+ free), auto-fallback combos, and a 10-engine token-compression pipeline (MIT)

Builders-welcome post with the substance up front (disclosure: I'm the maintainer). OmniRoute is a free, MIT, self-hosted AI gateway — one OpenAI-compatible endpoint over 237 providers — built around two problems: runs dying on a provider 429, and tokens bleeding on tool/log output.

One endpoint, 237 providers — 90+ of them free. You point any tool or agent at a single OpenAI-compatible endpoint (localhost:20128/v1) and it can reach 237 LLM providers without you rewriting anything. 90+ have free tiers and 11 are free forever (no card), which aggregates to ~1.6B documented free tokens/month — and that's honest, pool-deduped math (we count each shared pool once instead of inflating it; the methodology is public in the repo). There's a one-command setup-* for 13+ coding tools (Claude Code, Codex, Cursor, Cline, Roo, Kilo, Gemini CLI…), so switching your existing setup over takes seconds.

Fallback combos — so it never stops mid-task. A "combo" is a ladder of models the router walks automatically: your subscription first, then API keys, then cheap models, then free ones. When a provider returns a 500 or you hit a rate limit, it slides to the next target in milliseconds, mid-request, and your tool never even sees the error. There are 17 routing strategies (priority, weighted, round-robin, cost-optimized, auto/coding:fast…) plus three resilience layers — a per-provider circuit breaker, a per-key cooldown, and a per-model lockout — so one dead key can't take down a whole provider.

Fusion — an ensemble mode for the hard steps. Beyond simple routing, there's a fusion strategy that fans a single prompt out to a panel of different models in parallel and then has a judge model synthesize one best answer (mixture-of-agents, built in). It's cost-aware, so easy turns stay on one fast model and it only fuses when the step is worth it.

A 10-engine compression pipeline — the part most routers don't have. Every request flows through a transparent compression pass you can toggle/stack per combo. Instead of one trick, it stacks the best of the open-source ecosystem: RTK filters command/tool output (git diffs, test logs, builds) at 60–90%, Microsoft's LLMLingua-2 does ML semantic pruning, Caveman handles prose, session-dedup strips repeats across turns. Critically, code, URLs and JSON are preserved byte-perfect, and a default-on inflation guard throws the compressed version away and sends the original if compressing would actually grow the prompt — it never makes things worse. On tool-heavy sessions that's ~89% average input-token reduction (an 8k-token git diff becomes a few hundred). Full credit to every upstream project (RTK, Caveman, LLMLingua-2, Troglodita) is in the README.

Agent-native — the agent can drive the router itself. There's a built-in MCP server (95 tools across 30 audited scopes, over stdio / SSE / streamable-HTTP), plus A2A (v0.3, JSON-RPC 2.0) support. That means an agent can query providers, switch combos, read its own remaining quota and manage memory through the gateway — not just consume tokens through it.

It's 100% local (zero telemetry, AES-256-GCM at rest), MIT-licensed, has a prompt-injection guard on every LLM route, opt-in memory, and runs on npm, Docker, desktop or your phone via Termux.

For context on whether it's worth your time: it's grown to ~9.8K GitHub stars, 1,490+ forks and 280+ contributors in ~4.5 months, with 21,000+ automated tests and 1,830+ issues closed — so it's a battle-tested project, not a brand-new experiment.

npm install -g omniroute

GitHub: https://github.com/diegosouzapw/OmniRoute · Site: https://omniroute.online

Would value a critique of the routing/compression architecture from this crowd.

u/ZombieGold5145 — 2 days ago
▲ 3 r/devtools+1 crossposts

SOAR – a Python automation/runtime tool for organizing and running scripts (feedback wanted)

What My Project Does

SOAR (Script Optimization and Automation Runtime) is a Python-based automation runtime system that helps manage and run scripts in a more structured way than simply executing standalone .py files.

It provides a lightweight runtime layer where users can:

  • Organize scripts into projects
  • Run automation tasks through a CLI-style interface
  • Generate or scaffold simple project structures
  • View basic diagnostics/log output for runs
  • Experiment with modular “automation workflows” inside Python

The goal is to make small automation projects easier to manage without needing a full framework.

Target Audience

This project is mainly aimed at:

  • Beginner to intermediate Python developers
  • People who write lots of small automation scripts
  • Developers who want a lightweight alternative to heavier workflow/automation frameworks

It is currently more of an experimental / hobby project than a production-ready tool.

Comparison

Compared to existing tools:

  • vs plain Python scripts: SOAR adds structure and centralized execution instead of scattered files
  • vs full workflow tools (Airflow, Prefect, etc.): SOAR is much lighter and not designed for large-scale pipelines
  • vs CLI frameworks: SOAR focuses more on script organization + runtime behavior rather than just argument parsing

It sits somewhere between a script organizer and a minimal automation runtime.

Source Code

GitHub repository:
https://github.com/ScriptOptimizationAutomationRuntime/latest-version

(Additional resources like tutorials and updates are included in the repo.)

u/soardownload — 3 days ago
▲ 3 r/devtools+3 crossposts

Shipped an RV app to the App Store solo. My “codebase” is really a 49k-line architecture doc.

Veteran, solo founder, Texas. Python was the only thing I learned and in grad school. Spent 5 months building RigSense (boondocking/off-grid RV app) with Claude doing most of the actual engineering. It’s live now.

The thing that made it work wasn’t prompting tricks. It’s one giant architecture doc with numbered sections. Every session is “read §101.3 and §77, build the schedule tab.” Every commit references a section. When I skipped that and worked from memory, Claude fucked up and shipped shitty code and broke functionality.

The wthing nobody tells you: be careful with granting api keys to Claude for GitHub. The thing deleted my code base once.
Took it to an RV rally in April to demo with a bunch of Airstreamers and they loved the idea. I still think it’s too complicated but my wife likes it.

Stack is boring on purpose — SwiftUI, Supabase, Cloudflare.

Happy to answer questions about the doc workflow. App’s called RigSense if you want to see it.

apps.apple.com
u/chris_rigsense — 2 days ago
▲ 10 r/devtools+10 crossposts

I built an open-source local-first observability tool for Python AI agents – PeekAI

Hey,

I got tired of debugging my AI agents with print() statements so I built PeekAI.

It's a lightweight, framework-agnostic observability tool for Python AI agents. Zero config, no cloud, no account needed.

What it does:

  • Auto-instruments OpenAI/Anthropic SDK calls
  • Full span-based trace with waterfall view
  • Token + cost tracking per span
  • Tool call tracking
  • Trace replay — re-run any past trace, even swap models to compare cost/quality
  • CLI + Web UI, all local SQLite storage

Install in 2 lines:

pip install peekai

import peekai peekai.init() # that's it

It's early (v0.1) and open source (MIT). Would love feedback from anyone building agents — especially multi-agent systems.

GitHub: https://github.com/oussamaKH63/peekai PyPI: https://pypi.org/project/peekai

u/ousskh63 — 3 days ago
▲ 7 r/devtools+2 crossposts

Ways to reduce token cost in AI agents

Ive been building AI agents for a while and noticed a few patterns that help cut token usage without wrecking the workflow.

A few things that seem to matter most:

- Set hard token budgets per task or step.

- Stop runaway loops early with guardrails or circuit breakers.

- Use smaller models for cheap steps and reserve larger ones for harder reasoning.

- Summarize context aggressively instead of carrying the full history forever.

- Track token spend per workflow, not just per request.

- Add safety checks so bad prompts or tool loops don’t burn budget.

I’ve been working on a small npm SDK called agent-cost-controller that wraps some of these ideas into one place for Node/JS agent apps. But I can't find anything useful for Python.

Curious what else people are doing to keep agent token usage under control, especially in production

reddit.com
u/MimTheHuman — 4 days ago
▲ 6 r/devtools+3 crossposts

Built a local-first blast radius analyzer so AI coding agents stop breaking things they don't understand

I kept running into the same problem: AI coding agents (Cursor, Claude Code, etc.) would confidently rewrite a function without knowing what else in the codebase depended on it. One "simple fix" would silently break three other modules downstream.

So I built a tool that gives agents a structural map of the codebase before they touch anything — call graphs, blast radius analysis, and architecture boundaries, computed locally with no cloud calls.

A few technical details that might be interesting to this crowd:

  • Delta sync via SHA-256: instead of re-indexing the whole repo on every change, it hashes each file and only re-parses what actually changed. Makes it usable on large repos without a multi-minute wait every time.
  • Hybrid graph model: combines a structural graph (tree-sitter based, across Python/JS/TS/Java/C++/Go) with semantic embeddings, so queries can be answered by structure ("what calls this function") or by meaning ("where's the auth logic").
  • Blast radius: before an edit lands, it traces downstream callers/dependents so you (or the agent) know what's at risk.
  • MCP integration: exposes this as context directly inside Cursor/Windsurf/Claude Code, so the agent gets the graph without you manually pasting file contents.

It runs fully offline — no API keys, no data leaving your machine, works air-gapped with a local LLM if you want it fully isolated.Wanted to share it here since blast-radius-aware tooling for AI agents seems like a gap in the current OSS landscape.

Code's here if you want to poke at the architecture or the parsing layer: Github

Happy to answer questions about the graph construction, the delta-sync design, or tradeoffs I hit along the way.

codetraceai.in
u/Commercial_Media_962 — 3 days ago
▲ 2 r/devtools+2 crossposts

I built Rulepack — a PKGBUILD-inspired package manager for coding agent rules & skills

Hi,

I got tired of copy-pasting the same rules and skills across OpenCode, Cursor, Claude Code, GitHub Copilot, Windsurf, Gemini CLI, etc. So I built Rulepack.

It’s a declarative package manager where each rule/skill/agent is a YAML PKGBUILD descriptor. One source, multiple targets. You can create PKBUILDs manually or with help of your agent.

Quick workflow:

  bin/rulepack build                         # fetch sources & build platform artifacts                                                                                                                        
  bin/rulepack install --target opencode     # deploy with symlink/copy/inject/append                                                                                                                          
  bin/rulepack verify --target opencode      # detect drift                                                                                                                                                    
  bin/rulepack fix --target opencode         # repair drift                                                                                                                                                    
  bin/rulepack bump                          # check upstream git-sourced packages                                                                                                                             

Highlights:
• 14 supported agent platforms (user + project scope)
• Registry-driven translate/transform defaults (data/registry/platforms.yaml)
• Surgical installs, skill-bundle sub-skill selection, marker-based AGENTS.md append
• Upstream version tracking, transaction rollback, SHA256 checksums
• Ruby stdlib-only core, 357 tests, 0 failures

Repo: https://github.com/ozgurulukir/agent-rule-sync

Feedback, bug reports, and new platform/translator contributions are very welcome. If you maintain rules for more than one coding agent, this is the exact itch it scratches.

u/ozguru — 4 days ago
▲ 15 r/devtools+7 crossposts

Selling Inspect Mode Pro – Chrome Extension for Developers & Designers | Polar Payments Integrated + Source Code Included

I'm looking to sell Inspect Mode Pro, a Chrome extension built for developers, designers, and indie hackers who want to inspect websites more efficiently.

The product is fully functional and includes Polar payment integration, making it easy to manage one-time purchases, licenses, and customer access without additional setup.

What it does:

  • Inspect fonts, colors, spacing, and UI elements
  • Extract website assets and images
  • Analyze website design systems
  • Faster workflow than digging through DevTools for common tasks

What's included:

  • Full source code
  • Chrome Web Store listing
  • Branding and assets
  • Existing user base
  • Documentation and deployment instructions

Why I'm selling:
I'm currently focused on other projects and don't have the time to continue growing and marketing this one.

Potential growth opportunities:

  • SEO content around web design and development
  • YouTube tutorials and demos
  • Partnerships with design communities
  • Expansion into Firefox and Edge extensions
  • Additional premium features for agencies

If you're interested, send me a DM and I'll share details on users, revenue, traffic, tech stack, and asking price.

Happy to answer any questions.

u/aryanxcreates — 5 days ago
▲ 10 r/devtools+3 crossposts

I built an open-source macOS debugging proxy and I’m looking for repo/product feedback

I’m building Rockxy, an open-source native macOS HTTP/HTTPS debugging proxy:

https://github.com/RockxyApp/Rockxy

It’s for developers who need to inspect traffic from real apps, not just browser DevTools: Mac apps, command-line tools, iOS devices, iOS Simulator, HTTP/HTTPS, WebSocket, GraphQL, replay, rewrite/map local, breakpoints, and export/redaction workflows.

I’m not doing a big launch yet. I’m trying to improve the repo and understand what would make developers trust or try a tool like this.

If you have a minute, I’d love feedback on:
- Does the repo explain the product clearly?
- What would stop you from trying it?
- What should be more visible: screenshots, install flow, security model, roadmap, comparison with Proxyman/Charles?
- What issue would you open first?

Repo feedback is very welcome.

u/locnguyen305 — 6 days ago
▲ 26 r/devtools+10 crossposts

I built Plethora: An open-source, local-first Second Brain that auto-syncs with your Hack The Box progress

Hey everyone!

I've always struggled with keeping my HTB notes organized. Copy-pasting machine IPs, tracking what I've rooted, and organizing my write-ups manually in Obsidian/Notion was getting tedious. So, I spent some time building Plethora.

Plethora is a local-first desktop-style web app. You connect your HTB App Token, and it automatically pulls in your Machines and Challenges in the background while you play.

What it does:

  • 100% Local & Private: Uses a local SQLite database. Your private write-ups and secrets never touch the cloud.
  • Smart Auto-Sync: Tracks your progress and builds a global activity timeline (complete with a GitHub-style hacking heatmap and streak counter).
  • Rich Journaling: A dedicated markdown editor with instant auto-save and inline screenshot pasting.
  • Command Palette: Press Ctrl+Q to instantly full-text search thousands of your past journals, or let the app automatically extract your past bash/powershell commands (like finding exactly what nmap flags you used 3 months ago).

I just open-sourced it on GitHub and would love for people to test it out, break things, and give me feedback!

GitHub Repo: https://github.com/krishjain-2301/Kri27

To get started, just clone the repo, run npm install, and hit npm run dev.

Let me know what you guys think!

u/kojikojikoji234 — 7 days ago
▲ 19 r/devtools+3 crossposts

KiroEnsemble: an enterprise-grade multi-agent framework that turns Kiro CLI into a full dev team 🚀

Kiro CLI is powerful for single tasks. KiroEnsemble turns it into an entire development team.

If you use Kiro CLI, you know it's great for single tasks at your desk. But I wanted it to carry a whole feature the way a real developer does:

  • plan it
  • build it
  • test it
  • review it
  • document it
  • and open the PR
  • without me driving every keystroke.

So I turned it into a team. A lead agent orchestrates and never writes code itself: a builder writes the code, a validator runs the tests and checks the spec, a reviewer diffs the branch, and a documenter writes the docs

You hand it a ticket or a spec, and it delivers a tested, reviewed, documented, PR-ready change. No babysitting.

This isn't a toy. It's enterprise-grade, used by engineers within a renowned international company.

📊 It runs as a full-stack developer on live enterprise codebase:

in one month, straight from personal session logs (using /record-session custom skill):

- 20+ tickets shipped across 4 repos

- 89% completed clean (18 success, 2 partial, 0 failed)

- Features and bug fixes delivered, with tests, documentation and code review on every run

It picks up a ticket, builds it, tests it, reviews its own diff, documents it, and hands back a PR. That's the job of a full-stack dev, and it does it on real, conventions-heavy enterprise repos.

🏗 It also builds projects from scratch:

Point it at an empty directory and it ships. My portfolio site (https://mmo.sidihub.cloud/) and its AI assistant were built end to end by this team, from spec to deployment.

  • ⚙️ Why it holds up where other agent setups fall apart
  • 🧠 Real separation of roles, not one model pretending to be five.
  • 📋 Spec-driven: requirements, design, and tasks, so it builds what you asked for.
  • 🪶 Lean orchestrator: the lead never loads your code, so context stays sharp across long runs.
  • 🔁 Self-correcting: bounded retries with a diagnostician pass before giving up.
  • 🎫 Process-aware: reads tickets, follows your branch and commit conventions, opens the MR, posts the summary.
  • ✍️ Grounded knowledge and context through typed files & can be integrated with obsidian
  • 🔒 Safe by default: a guard blocks destructive commands and asks before anything risky.

🛠 How it works

Clone the repo, copy the .kiro setup into your desired repo, and either manually edit the conventions and names to fit your needs & project, or ask kiro to adapt it, 5~ minutes setup.

  1. Prepare a Spec or a Plan document for what you want to work on.
  2. Start a kiro-cli session -> type `/agent swap team-lead`
  3. prompt the team-lead to pickup the spec and start the agentic workflow

NOTE: Agents have pre-chosen models that I personally use, edit the agent configs to change the models to your credit budget.

GitHub Repo: https://github.com/Moifek/kiro-ensemble

Free and open-source (MIT). Clone it, run it on your own project, and tell me where it breaks.

Roadmap: Better logging & Mobile integration

What's the most complex thing you'd trust an agent team to ship for you?

Hey u/few_Map7816 I'll be trying to make this work with your solution :D let's connect !

u/DaraosCake — 7 days ago

Documentation tool for teams

I'm looking for a documentation tool that's actually AI-agent native, and I can't find one.

Today's tools optimize for humans:

Google Docs / M365: great collaboration, poor for coding agents.

Miro / tldraw: great diagrams, weak docs.

Obsidian: AI-friendly, but not zero-setup or real-time collaborative.

MCPs/APIs help, but they're just a bridge. Agents end up spending context on tool syntax instead of the content, and they lose the simplicity of native file/workspace interactions.

What I want is basically Google Docs + tldraw, but:

Zero setup (just open a browser and collaborate)

Real-time collaboration

Docs and diagrams as first-class citizens

AI edits alongside humans

Comments, history, and blame built in

Does anything like this exist, or is everyone still stitching together multiple tools?

reddit.com
u/NoDrawer7721 — 7 days ago
▲ 1 r/devtools+1 crossposts

I wrote a CLI that writes your git commit messages. Works with any AI provider.

Spent the weekend building ai-commit - a CLI tool that generates commit messages from your staged diff. The gimmick is it's provider-agnostic: works with OpenAI, Anthropic, Ollama, Groq, whatever you have.

Why this exists: every other AI commit tool locks you into one provider. OpenAI only. Anthropic only. A local Ollama thing. If you switch providers, you switch tools. This one swaps providers in a config line or env var.

Also has a --install-hook flag that sets up a prepare-commit-msg hook.

Install: pip install ai-commit. Source: github.com/mohamedorigami-jpg/ai-commit

Curious what people think. Does provider-agnostic matter to you?

u/InterestingCherry812 — 6 days ago
▲ 2 r/devtools+2 crossposts

I built a macOS app to manage local development stacks (looking for beta testers)

Hey all,

I have a tool you might be interested in, it's something I've been working on for a while and it's almost ready for prime time.

Tiny backstory as to why this tool exists which some of you might relate to...

So, I got tired of trying to remember the exact order to start my local development environment.

Some projects needed the API first, then Redis, then workers, then a frontend, then a tunnel. When something failed, I ended up jumping between half a dozen terminal tabs trying to figure out what had actually gone wrong.

So I built Stacksmith.

Instead of just launching processes, Stacksmith acts as a control panel for your local development stack. You describe your services in a simple .stacksmith.yml file, and it manages them from a native macOS app.

Current features include:

  • Start and stop your entire stack together
  • Live logs for individual services or the whole stack
  • Health checks with overall stack status
  • Service dependencies and startup ordering
  • Port conflict detection
  • Diagnosis of common startup problems using Apple’s on-device Foundation Models
  • Built-in MCP server, allowing AI assistants to inspect and control your local development stack
  • Uses Apple's Foundation framework (local ai) to help diagnose issues, this is all private, and everything stays on your machine. _You must have the fairly modern machine to use this feature though_.

I’m not trying to replace terminal first tools like Foreman or Overmind they’re great. Stacksmith is aimed at developers who want better visibility into what’s happening after the processes start, especially when something goes wrong.

It’s currently macOS only, although I’d love to support Linux if there’s enough interest (maybe even windows) the core is built using Swift and is platform agnostic so portability shouldn't be a problem.

Website: https://getstacksmith.app

I’m looking for feedback on:

  • Does the YAML model make sense?
  • What information do you wish you had when your local stack breaks?
  • Does the UI make it obvious what’s happening?
  • What’s missing from your workflow?

The app will eventually be paid, but beta testers get free access.

If you’d like to try it, send me a DM and I’ll send over a code.

reddit.com
u/jonnothebonno — 7 days ago