r/OpenaiCodex

ContextForge: a local proxy that cut my Claude Code token usage by up to 72%

Hi everyone,

I’ve been working on a project to address a specific frustration I had with AI coding agents: token waste. I noticed that agents often burn a significant portion of the context window just re-reading the same files to find functions or re-discovering the repository structure on every turn.

I built ContextForge — a local proxy and CLI that acts as a "codebase-aware" runtime.

How it works

ContextForge sits between your agent (like Claude Code) and your LLM provider. Instead of letting the agent "guess" where files are, it provides local intelligence:

Local AST Graph: It indexes your repo using native C++ parsing into a local SQLite graph. When the agent needs to find a symbol, the proxy handles the lookup locally.
Context Optimization: It applies a compression pipeline that skeletonizes older file history (keeping only signatures) and vaults oversized responses (like lockfiles), replacing them with pointers.
Protocol Translation: It translates Anthropic requests into OpenAI format, which allows you to run Claude Code against Ollama/OpenAI-compatible models with full streaming support.

Case Study: "Soft-Delete" Feature

To test the architecture, I implemented a complex feature in an Express.js backend using an Ollama model. I compared a raw session (Passthrough) against one routed through ContextForge.

Metric	Passthrough Mode	ContextForge Mode	Difference
LLM round-trips	41	14	66% fewer
Input tokens	1,632,266	444,092	72.8% fewer
Output tokens	1,632,266	384,033	76.5% fewer
Session Compression	—	60,059 (13.5%)	—

Understanding the Metrics:

Workflow Savings (72.8%): These are tokens that were never generated because the tooling changed the workflow. The model used the local graph to find symbols instead of "guessing" via file searches, solving the task in 14 steps instead of 41.
Session Compression (13.5%): This is the actual text removed from the prompts within the session via skeletonization and deduplication.

Note: These results are from a specific, repository-heavy task. Savings vary significantly based on the work—long refactors benefit most, while short chats benefit much less.

Get Started

I've just released v1.0.3 and I'm looking for feedback from the community

Install: npm i -g @anuj612/contextforge
GitHub: https://github.com/anujkushwaha612/ContextForge

Note: No compiler needed — ships with prebuilt native binaries for Windows, macOS, and Linux via npm.

I’d love to hear your thoughts on the project and to tackle the new bugs and issues coming forward.

github.com

u/Independent_Pick3116 — 9 hours ago

▲ 2 r/OpenaiCodex+1 crossposts

preciso de ajuda técnica de alguem que entenda de Progamação,Second Brain,Obisidian

"Estou construindo um sistema pessoal de conhecimento que integra IA, notas e protocolos de execução. Meu maior desafio hoje é explicar a arquitetura de forma simples. Como vocês documentam sistemas complexos?"

reddit.com

u/Budget-Sense-3509 — 6 hours ago

▲ 14 r/OpenaiCodex+9 crossposts

Introducing LeakScope: A Security Scanner for Supabase Applications

Introducing LeakScope, again.

we've been updating it : )

LeakScope is a security scanner built for Supabase applications. Paste your app's public URL, and it checks what an attacker can learn from the outside—from exposed keys and public data access to weak RLS, leaked credentials, and insecure frontend configuration.

We've introduced two scanning modes:

Light Scan — Paste a public app URL to instantly check for exposed keys, public data exposure, leaked credentials, weak RLS, and risky frontend configuration. No account required.

Deep Scan — Authenticate to validate Row Level Security, test BOLA/IDOR, analyze JWT security, and generate detailed reports for real security validation.

Whether you're a solo founder, indie hacker, or vibe coder shipping MVPs at 2 AM, LeakScope gives you a fast way to see what your app is exposing before everyone else does.

1,936 websites scanned.
13,679 security findings identified.

Try it out at leakscope[.]tech

u/StylePristine4057 — 11 hours ago

▲ 1.3k r/OpenaiCodex+3 crossposts

Guys 1000 dollar plan is incoming 💀

Hope openai won't do it 😭

u/Independent-Wind4462 — 1 day ago

▲ 0 r/OpenaiCodex

Which is better codex or claude?:))

So lets settle this which is better codex or claude, and anyone who can patiently explain me what are each of its limitations and why whatever you picked is better alert: i am new to vibecoding , basically a non-technical 18 year old , pls reply :))

reddit.com

u/Automatic-Fix-301 — 1 day ago

▲ 28 r/OpenaiCodex+13 crossposts

Deterministic folding for LLM agents: continuity without LLM compaction

I just open-sourced Context Warp Drive, a continuity engine for LLM agents.

Repo: https://github.com/dogtorjonah/context-warp-drive

Right now, the industry has two bad ways of dealing with long agent horizons:

Just ride the 1M-2M context window.
Use an LLM to summarize older messages ("compaction").

LLM summaries are inconsistent, they burn an extra model round-trip, they quietly drop the exact identifiers your agent needs (UUIDs, paths, hashes), and worst of all, they constantly rewrite the prefix—which trashes your provider prompt cache.

This library takes a different approach: deterministic folding.

As the agent works, older context is folded into deterministic skeletons. Instead of linearly bloating to the ceiling, the active context sawtooths—building up efficiently, then dropping back down to a clean floor without losing continuity.

Why not just use the 1M token window?

Because 95% of what an agent carries with it on a long task isn't needed right now. It's looking for the needle in the haystack, but massive context windows force it to carry all the hay.

A larger window raises the ceiling, but it doesn't move the floor where models reason best. Long-context evals keep showing the same thing—models do not use giant contexts as cleanly as the marketing numbers imply:

Lost in the Middle — models degrade when needed information is buried in the middle of long context.
RULER — large drops as context length and task complexity increase, even for models advertised as long-context.
Context Length Alone Hurts LLM Performance Despite Perfect Retrieval — length itself hurts performance even when retrieval succeeds.
Intelligence Degradation in Long-Context LLMs — models can collapse past critical context thresholds even when input remains relevant.

By keeping the agent deterministically folding with a warm cache and a low context band, you keep it snappy, cheap, and focused. You leave the hay behind until it's actually needed.

How Context Warp Drive works:

The Rebirth Seed: The continuity package that makes the full reset possible. It carries the recent user and AI messages, what the agent was actively working on and editing, its execution plan state, preserved exact identifiers from the full trace, and episodic context from earlier work. It is not a vague summary—it is a structured, deterministic snapshot the agent can wake up from and continue seamlessly.
Cache-Hot Appending: As the agent works, older turns fold into compact bands that append onto the rebirth seed. The context builds up over time, but because the seed stays byte-identical, you pay for cheap cache reads turn after turn instead of expensive fresh inputs.
The Sawtooth Reset: You can't append forever. When measured input pressure hits your configured ceiling, the engine performs the full sawtooth—the context drops back to a fresh rebirth seed and the cycle continues from a low-context floor.
Zero-LLM Folding: Raw chat history stays preserved as the source of truth, but the model sees a deterministic compact view. Tool calls, paths, receipts, retained reasoning, and exact identifiers are all preserved without asking another model to summarize anything.
Episodic Recall: When the agent re-touches a path or concept from before the reset, the engine pages the relevant folded detail back in. The agent doesn't carry all the hay—it pulls it back when it matters.
Task Rail: I also included a portable execution primitive called TaskRail. It keeps long-horizon plan state outside the prompt: steps, progress, acceptance criteria, and serializable checkpoints. Combined with folding and rebirth seeds, the agent stays low-context while still knowing exactly where it is in a multi-step workflow.

What's in the repo:

Core folding engine, provider-agnostic across Anthropic content blocks, OpenAI-style tool_calls, and Gemini parts.
Anthropic prompt-cache breakpoint helpers to maximize read-hits.
Raw rebirth seed renderer.
Model-aware context budget resolver.
Fold recall and episodic recall (with an optional SQLite episode store).
Portable Task Rail state machine.
Gemini CLI and Codex CLI folding adapters.

There are a lot of knobs you can tune, but the core philosophy is the same: use the 1M window as safety headroom, not as the operating band.

(Not on npm yet—install from source for now.)

I've been running this in my own multi-agent orchestration stack for months and completely dropped LLM compaction. The difference is fundamental: the agent stops treating context as a giant backpack and starts treating it like a paged working set—small, hot, recoverable, and always grounded in the raw trace.

u/MusicToThyEars — 2 days ago

▲ 4 r/OpenaiCodex

How can i access chatGPT 5.3-codex model

I noticed the option for 5.3 – Codex model has been removed from my Codex interface. Is there any way to use it?

reddit.com

u/Owdez — 2 days ago

▲ 2 r/OpenaiCodex

Agent prompting itself

is there any way to make one thread prompt other threads inside a workspace ? i’m taking to one chant and ask it for the prompt (because it’s repo aware) and i have to copy and paste it all the time is there any way to automate that ? it’s just gonna breakdown the steps into 10 promos and verify and continue to the next , the verification part by the main thread is important

reddit.com

u/Owdez — 2 days ago

▲ 1 r/OpenaiCodex+1 crossposts

OpenAI and Codex are absolutely rubbish!

The problem I’m facing is that I am unable to log in to the Codex app using my OpenAI account, as mobile number verification is now required, and the number I used when registering has long since been deactivated, so I cannot receive the verification text message. OpenAI’s explanation is as follows:

For account security reasons, OpenAI now requires accounts to be verified via a mobile number.
OpenAI does not support changing the mobile number associated with an account and has advised me to register a new account.
OpenAI does not support transferring remaining subscription time from one account to another.

As an annual subscriber, I still have six months left on my subscription, and I wasn’t given any warning about the need for telephone number verification when I paid for it. They won’t let me change my number or transfer my subscription – OpenAI is simply ripping off its users!

Has anyone else encountered the same problem?

reddit.com

u/citywwm — 3 days ago

▲ 231 r/OpenaiCodex

Gpt 5.6 probably launching today or tomorrow

u/Independent-Wind4462 — 3 days ago

▲ 394 r/OpenaiCodex+34 crossposts

browser-search — three tools, zero cost, and your AI agent learns to search and browse the web

/r/Hermes/comments/1uclwgi/browsersearch_three_tools_zero_cost_and_your_ai/

u/Ill-Tradition1362 — 4 days ago

▲ 22 r/OpenaiCodex

A Codex Skill to Check Banked Reset Expiry Dates

I made a tiny Codex skill to check when your banked resets expire.

Example output:

4 reset credits available

Full reset: expires 11 July 2026, 21:44 EDT
Full reset: expires 17 July 2026, 20:34 EDT
Full reset: expires 26 July 2026, 19:47 EDT
Full reset: expires 31 July 2026, 15:07 EDT

Repo: https://github.com/wisdom-in-a-nutshell/agents/tree/main/skills-source/owned/codex-reset-credits

To install it, ask Codex:

Add this as a Codex skill:
https://github.com/wisdom-in-a-nutshell/agents/tree/main/skills-source/owned/codex-reset-credits

Then run:

Use $codex-reset-credits to tell me when my Codex resets expire.

Inspect before running. It reads the local Codex auth session and does not print tokens or account/credit IDs.

reddit.com

u/phoneixAdi — 4 days ago

▲ 4 r/OpenaiCodex+1 crossposts

How do I prompt Codex properly?

I’m trying to build a pine script for trading NQ1!. I’m not trying to automate it because prop firms don’t allow bot trading. I have good experience and have been trading for some time now, but I want an indicator that gives me a little help seeing things I miss while trading. Can anyone teach me how to properly prompt codex?

reddit.com

u/Acrobatic-Hour3007 — 4 days ago

▲ 1 r/OpenaiCodex

Codex showed me onboarding after opening it yesterday. My projects were gone.

Yesterday i opened Codex and was shown an onboarding. Then all my projects + attached chats were gone. Managed to get the projects and underlying chats to show up again on the left sidebar but the right sidebar wont show the run actions, github functions etc. cant even see which branch i am. Why does codex always break my chats. Anyone know whats going on?

reddit.com

u/serdox — 3 days ago

▲ 6 r/OpenaiCodex

I think we’re getting selectively nerfed or buffed models (or selectively bad or good limits).

Half of you are getting nerfed models or getting terrible limits. The other half are getting great limits and great models.

It doesn’t make sense. The only way this works is if OpenAI is A/B testing this.

It’s the only reasonable explanation apart from them just having really bad backend services. Or go listen to Tibo. To be fair, it’s definitely possible they have a bad backend, since users on Free used to have unlimited GPT-5.5 (although in my experience, it wasn’t the real deal). But their issues can’t have continued for this long.

I just don’t get it. This will benefit OpenAI. And the people who have just bought a Pro plan seem to be getting the best experience. I’ve even heard people say buying a new account helps rather than renewing your old one.

The people who are saying Codex is working horribly for them ARE NOT wrong.

The people who are saying Codex is working amazingly for them are ALSO NOT wrong.

Don’t you get it?

Evidence to suggest some people have nerfed limits:

https://www.reddit.com/r/codex/comments/1ukhb52/10_of_monthly_limit_gone_in_a_few_file_reads/

https://www.reddit.com/r/codex/comments/1u9tm3i/why_5hour_limits_run_out_faster/

https://www.reddit.com/r/codex/comments/1ukuyqg/i_just_asked_for_a_tiny_little_change_and_its/

Evidence to suggest some people have nerfed models:

https://www.reddit.com/r/codex/comments/1ujnwxb/codex_quality_issues_in_realworld_coding_ignored/

https://www.reddit.com/r/codex/comments/1ukh4ey/youre_right_to_push_back_55_xhigh/

https://www.reddit.com/r/codex/comments/1u8zl0h/gpt_is_absolutely_downgraded_cannot_follow_simple/

https://www.reddit.com/r/codex/comments/1ukuyqg/i_just_asked_for_a_tiny_little_change_and_its/

To suggest good limits:

https://www.reddit.com/r/codex/comments/1uboucu/the_200_pro_plan_is_completely_worth_it_for_the/

https://www.reddit.com/r/codex/comments/1ugpk46/im_rich/

To suggest good models:

https://www.reddit.com/r/codex/comments/1ui8zbh/hello_codex_for_good/

https://www.reddit.com/r/codex/comments/1ugbiob/no_prior_coding_experience_codex_just_built_my/

Of course, because we have to provide evidence for everything now.

reddit.com

u/BritishDudeGuy — 4 days ago

▲ 11 r/OpenaiCodex

Codex quality degrading over the last couple weeks?

I'm finding the quality of codex on extra high thinking is getting worse. Is this known information? Does anybody else feel this way as well?

reddit.com

u/bhowiebkr — 4 days ago

▲ 3 r/OpenaiCodex

screw token maxxing , how do i actually limit token usage ?

i keep hitting limits because agents read too much context, too many files, or overthink simple tasks.
what are your practical tricks for keeping token usage low?

reddit.com

u/Owdez — 5 days ago

▲ 27 r/OpenaiCodex+16 crossposts

I spent months building a free Windows AI app with an AI council system — no subscription, no account, no data leaving your machine

Been building this for a while and finally put out a first release. Not going to oversell it, just going to describe what it actually does.

The core idea came from being tired of AI tools that give you one confident answer and leave you to figure out if it's right. So I built something where the output you see has already been challenged internally before it reaches you. Not the same model second-guessing itself. A genuinely separate process with a different job, specifically designed to find problems with what was just produced.

There are two sides to the app.

The first is a council mode where you load local AI models and assign them different roles. One role breaks down your task and makes a plan. Another executes against that plan. A third receives both the plan and the result and checks one against the other. For coding tasks it actually runs the code before the reviewer sees it, so problems get caught by execution rather than by a model guessing whether it looks correct. If problems are found it either patches the specific issues or rewrites entirely depending on how bad it is. What you get at the end has been through all of that.

It also has session memory that builds up as you work, a document pipeline that processes files into structured knowledge before you start asking questions, task history, a diff view showing exactly what changed between the original output and any revision, and confidence labels on every result.

The second is a normal chat mode that runs Python, JavaScript, C#, Java and PowerShell inline and shows execution results inside the conversation. Web search with full page content extraction, LaTeX math rendering, a thinking mode, document attachment, and chat branching where you can fork from any point in the conversation.

Both modes run locally on your machine using GGUF models. If you don't want to manage model files there is a cloud mode through OpenRouter using their free models, same full pipeline, no local setup needed.

No account. No signup. No subscription. Open the app and use it.

MIT licensed. GitHub: github.com/YoMosa2009/Axiom

Happy to answer questions about anything.

u/The_guy_withnolife — 5 days ago

▲ 12 r/OpenaiCodex+6 crossposts

I built a Codex session review app using Codex. How are you tracking your AI coding workflows?

I built a small free macOS tool for reviewing Codex sessions using the Codex desktop app. Are people here using anything similar to improve their AI coding workflows?

After longer Codex runs, I kept finding that the transcript was technically available, but hard to review.

The things I wanted to inspect were:

- What changed

- Which files were touched

- Where tokens went

- Which tool calls mattered

- Whether the prompt/context was good enough to reuse

- What context would be useful to share during code review

So I made BuildrAI, a local-first app that turns Codex session artifacts into timelines, token usage, prompt/session evaluation, changed-file context, and shareable reports.

I’m curious how other people are handling this.

Do you review Codex sessions after the fact, or do you mostly trust the final diff?

u/michaliskarag — 5 days ago

▲ 6 r/OpenaiCodex+3 crossposts

I open-sourced a Codex plugin that makes AI agents leave receipts before saying done

I built Superloopy because the failure mode that bothers me most with AI coding agents is not just bad code — it’s unverifiable “done.”

It’s a lightweight Codex plugin where you type:

loopy <task>

and the agent is pushed through a proof-of-done loop:

plan → run real commands → save evidence → check criteria → final report

The repo-local state lives under `.superloopy/`, and every passed criterion is supposed to point at a real artifact under `.superloopy/evidence/`.

The default path is meant to stay lightweight: basically receipts for what changed, how it was tested, and what is still uncertain. Stricter gates, hooks, and optional crew/subagent mode are there for bigger tasks.

Repo:

https://github.com/beefiker/superloopy

If it looks useful, a GitHub star would mean a lot 🙂

More importantly, I’d love feedback from people who use Codex or other coding agents:

- Is “proof of done” clearer than “loop engineering”?

- Would evidence receipts make you trust agent output more?

- Where would this feel helpful vs. too much ceremony?

github.com

u/Simple_Somewhere7662 — 6 days ago