r/CodexHacks

Are people actually using those ultra-cheap LLM API proxies (the Xianyu/Taobao model)?

Hey everyone, I recently came across a massive arbitrage happening in the AI dev space, mostly among Chinese students and developers.

They are apparently bypassing official pricing entirely, using local platforms like Taobao or Xianyu to buy access to GPT and Claude APIs at literal fractions of a cent. People are claiming to burn through 100M+ tokens a day just "vibecoding" nonstop, all for around $1.

From what I gather, these "API proxy stations" work through bulk-buying or enterprise account splitting. The middleman routes everything through their own reverse-proxy servers and resells the API keys. We're talking a 95%+ discount compared to official OpenAI/Anthropic rates.

Obviously, your data privacy goes out the window since your prompts are routed through unknown third-party servers.

It got me thinking:

  1. Is anyone here actually relying on these setups for personal projects or vibecoding?
  2. If so, how's the latency and reliability compared to the official API?
  3. Do we have a Western/Global equivalent of these gray-market API hubs, or is everyone outside of China just paying full retail?
reddit.com
u/Inside_Canary_149 — 20 hours ago
▲ 28 r/CodexHacks+13 crossposts

Deterministic folding for LLM agents: continuity without LLM compaction

I just open-sourced Context Warp Drive, a continuity engine for LLM agents.

Repo: https://github.com/dogtorjonah/context-warp-drive

Right now, the industry has two bad ways of dealing with long agent horizons:

  1. Just ride the 1M-2M context window.
  2. Use an LLM to summarize older messages ("compaction").

LLM summaries are inconsistent, they burn an extra model round-trip, they quietly drop the exact identifiers your agent needs (UUIDs, paths, hashes), and worst of all, they constantly rewrite the prefix—which trashes your provider prompt cache.

This library takes a different approach: deterministic folding.

As the agent works, older context is folded into deterministic skeletons. Instead of linearly bloating to the ceiling, the active context sawtooths—building up efficiently, then dropping back down to a clean floor without losing continuity.

Why not just use the 1M token window?

Because 95% of what an agent carries with it on a long task isn't needed right now. It's looking for the needle in the haystack, but massive context windows force it to carry all the hay.

A larger window raises the ceiling, but it doesn't move the floor where models reason best. Long-context evals keep showing the same thing—models do not use giant contexts as cleanly as the marketing numbers imply:

By keeping the agent deterministically folding with a warm cache and a low context band, you keep it snappy, cheap, and focused. You leave the hay behind until it's actually needed.

How Context Warp Drive works:

  • The Rebirth Seed: The continuity package that makes the full reset possible. It carries the recent user and AI messages, what the agent was actively working on and editing, its execution plan state, preserved exact identifiers from the full trace, and episodic context from earlier work. It is not a vague summary—it is a structured, deterministic snapshot the agent can wake up from and continue seamlessly.
  • Cache-Hot Appending: As the agent works, older turns fold into compact bands that append onto the rebirth seed. The context builds up over time, but because the seed stays byte-identical, you pay for cheap cache reads turn after turn instead of expensive fresh inputs.
  • The Sawtooth Reset: You can't append forever. When measured input pressure hits your configured ceiling, the engine performs the full sawtooth—the context drops back to a fresh rebirth seed and the cycle continues from a low-context floor.
  • Zero-LLM Folding: Raw chat history stays preserved as the source of truth, but the model sees a deterministic compact view. Tool calls, paths, receipts, retained reasoning, and exact identifiers are all preserved without asking another model to summarize anything.
  • Episodic Recall: When the agent re-touches a path or concept from before the reset, the engine pages the relevant folded detail back in. The agent doesn't carry all the hay—it pulls it back when it matters.
  • Task Rail: I also included a portable execution primitive called TaskRail. It keeps long-horizon plan state outside the prompt: steps, progress, acceptance criteria, and serializable checkpoints. Combined with folding and rebirth seeds, the agent stays low-context while still knowing exactly where it is in a multi-step workflow.

What's in the repo:

  • Core folding engine, provider-agnostic across Anthropic content blocks, OpenAI-style tool_calls, and Gemini parts.
  • Anthropic prompt-cache breakpoint helpers to maximize read-hits.
  • Raw rebirth seed renderer.
  • Model-aware context budget resolver.
  • Fold recall and episodic recall (with an optional SQLite episode store).
  • Portable Task Rail state machine.
  • Gemini CLI and Codex CLI folding adapters.

There are a lot of knobs you can tune, but the core philosophy is the same: use the 1M window as safety headroom, not as the operating band.

(Not on npm yet—install from source for now.)

I've been running this in my own multi-agent orchestration stack for months and completely dropped LLM compaction. The difference is fundamental: the agent stops treating context as a giant backpack and starts treating it like a paged working set—small, hot, recoverable, and always grounded in the raw trace.

u/MusicToThyEars — 2 days ago
▲ 2 r/CodexHacks+1 crossposts

New to Codex! Best way(s) to use as a Grad Student

Hi there, I am new to codex trying it out cause the school gave a free credit but what could I really use it for to optimize my life. I'm down to use it for my personal life as well but I want to get the most out of it as possible. Any suggestions as a psychology graduate student, id super appreciate

reddit.com
u/Leading-Measurement7 — 3 days ago
▲ 9 r/CodexHacks+2 crossposts

Quick question about Codex resets — 5‑hour limit or weekly limit?

I’m trying to understand how Codex resets actually work so I don’t accidentally waste them.

Some people say it’s a 5‑hour rolling limit, others say it’s a weekly quota, and I can’t find anything official that clearly explains it. When you hit the cap, is it supposed to reset after a few hours, or only once per week?

If anyone has tested this recently or has a definitive explanation, I’d really appreciate it. Just trying to plan my usage so I don’t burn through resets unnecessarily.

Thanks in advance!

reddit.com
u/yosofun — 4 days ago
▲ 9 r/CodexHacks+6 crossposts

Built an opensource tool for handling context continuity when starting new sessions or switching between different coding agents

I’ve been using coding agents on real software projects, both at home and at work.

I used to think the problem was about memory and context. That was the obvious diagnosis.

Every new coding-agent session started with the same ritual. Open the repository. Read the README. Inspect the project structure. Search for the files that looked important. Reconstruct the task. Guess which commands mattered. Ask again what had already been tried. Then do the actual work.

A new Codex / Claude Code / Copilot session often has to rediscover:

  • repo structure
  • relevant files
  • decisions already made
  • commands that already failed
  • current task state
  • validation steps that passed or were skipped
  • what the previous agent left unfinished

The agent-context problem framing is too broad.

A larger context window helps the current session. A vector store can retrieve related notes. Chat history contains previous discussion.

But none of those automatically preserve execution continuity.

The distinction I ended up caring about is:

Context is what the agent has available now. Continuity is what lets the next execution continue from what actually happened before.

A few lessons so far:

  1. Bigger memory can become an expensive junk drawer.

If old assumptions, failed paths, stale summaries, and validated facts all have the same weight, the next agent can be confidently wrong.

  1. The useful memory is usually small.

A new session does not need the whole project history. It needs the right starting point, known pitfalls, active work state, and validation expectations.

  1. Provenance matters.

A handoff like:

“we probably fixed the parser”

is much weaker than:

files edited
command run
result observed
known validation gap
next recommended action
evidence quality
  1. Stale context needs to be visible.

A previous handoff can still be useful, but it should not be treated like truth forever.

Approaching a workaround

I built an open-source and free repo-local continuity runtime for coding agents.

The core loop is intentionally boring:

resume -> agent work -> finalize

It stores operational continuity under .aictx/ in the repository, then reloads a bounded resume capsule at the start of the next task.

The goal is not to give the agent a huge hidden memory.

The goal should be to preserve a small, inspectable handoff:

  • what was being worked on
  • what changed
  • what failed
  • what was validated
  • what decisions were made
  • what is stale or unverified
  • what the next session should do

The repo feels like the natural boundary for this. It already contains the code, tests, branch, diff, commands, failures, and artifacts of work.

So the continuity that helps future agents should live there too, not only inside one chat session or one vendor-specific memory layer.

The tool may change, but the architectural lesson is the part I care most about is that coding agents do not only need to remember more, they need to continue better.

This is not useful for every task.

For a one-shot prompt, the overhead may not be worth it.

Where it starts to make sense is multi-prompt work, multi-session work, larger repositories, cross-agent workflows, and tasks where failed commands or validation state matter.

If you are curious you can take a look here.

Repo: https://github.com/oldskultxo/aictx

Happy to read opinions, critics or whatever!

u/Comfortable_Gas_3046 — 7 days ago
▲ 118 r/CodexHacks+9 crossposts

Conduit: free, open source SSH/Mosh/SFTP client for Android and iOS with YubiKey/FIDO2 hardware key support

I built a free, open source SSH/Mosh/SFTP client for Android and iOS that supports YubiKey and other FIDO2 hardware keys over USB and NFC.

Auth works for both ed25519-sk and ecdsa-sk credentials via CTAP2. USB and NFC on Android, NFC on iOS. Works in both terminal and SFTP flows. Agent forwarding is supported too, so your YubiKey can authenticate onward hops without copying keys to remote machines. You'll be prompted to tap for every signature, same as a normal connection.

No account, no subscription, no cloud sync, no analytics, no paid features. Everything stays on device.

F-Droid: https://f-droid.org/packages/com.gwitko.conduit/

GitHub: https://github.com/gwitko/Conduit

App Store: https://apps.apple.com/app/id6780054869

Play Store is coming soon. If you want early access, join the beta: https://play.google.com/apps/testing/com.gwitko.conduit (you'll need to join this group first: conduit-closed-test@googlegroups.com)

I would really appreciate feedback from the yubikey community on my integration of the auth flow with the hardware keys. Note that the flow is a bit different on android and ios.

EDIT to join group go here: https://groups.google.com/g/conduit-closed-test

u/gwitko — 13 days ago
▲ 6 r/CodexHacks+3 crossposts

Are AI “loops” just agents grading their own homework?

Over the last few weeks I’ve noticed that “loops” have become the new buzzword in the AI agent space.

The typical pattern looks something like:
Generate

Evaluate

Improve

Evaluate

Improve

Repeat until score >= X

The claim is that this produces better outcomes than a single-shot prompt.

What I’m struggling with is a practical concern:
In many cases, the same model that generates the solution is also evaluating the solution.

So the loop becomes:
I think A is a good idea

I evaluate A

A still looks good

I improve A

Now A looks even better

But what if the original assumption was wrong?

For example:
Choosing the wrong architecture
Solving the wrong customer problem
Optimizing the wrong KPI
Building features when the real issue is distribution/sales

A loop seems very good at refining an answer, but not necessarily at questioning whether it’s working on the right problem in the first place.

In my own experience, the biggest improvements often come from:
A different perspective
Human pushback
Challenging assumptions
External evidence
Not from running 10 more iterations of the same reasoning process.

Loops make perfect sense to me when there is an objective external signal:
Tests pass/fail
Benchmark score
Data validation
Reconciliation
Linting
Compilation

But for strategy, product decisions, architecture choices, or business decisions, aren’t we just creating a system where the model repeatedly convinces itself that its own idea is correct?

How are people dealing with this in production systems?

Do you:
Use separate generator/evaluator models?
Introduce adversarial reviewers?
Rely on human checkpoints?
Have objective evaluation criteria I’m missing?

Curious to hear from people running real agent workflows rather than demos. Have loops actually improved outcomes for you, or mostly increased token consumption and complexity?

reddit.com
u/Normal_Addendum_3144 — 14 days ago
▲ 3 r/CodexHacks+2 crossposts

Does your 5H and weekly limit feel wonky today?

Mine was supposed to just reset back to 100% but now with just a single prompt its down to 20% for 5H, and 88% for weekly already!

reddit.com
u/Sophia_AveryZ — 11 days ago
▲ 1 r/CodexHacks+1 crossposts

O codex acaba conmigo o yo con el...

No soy programador profesional, pero tengo idea de programar, pero no me se de memoria los lenguajes.

Llevo tiempo con un proyecto personal, que lo fui tirando poco a poco hacia delante con mucha paciencia y consultando mucho con chatgpt. A base de ir creando contexto, trabajar con diferentes hilos, conseguí una cierta base buena en mi framework de php.

Tengo mucha documentación del proyecto, mucho phpdoc en las clases y funciones... Etc.

El problema que tengo, es que por mucho contexto que le facilite, por mucho que le prohíba cosas en agents.md codex hace lo que quiere. Empecé a probarlo un día, y aparentemente tuvo buenos resultados y pudimos avanzar bastante con algunas cosas.

Soy el único que nota eso?

He ido leyendo más o menos lo mimo, problemas con codex por todos sitios y la verdad es que yo ya no sé qué más puedo hacer.

u/PrestigiousPair7210 — 13 days ago