r/OpenAIDev

▲ 4 r/OpenAIDev+2 crossposts

Does OpenAI’s new global memory introduce a new failure mode?

I’ve been thinking about something that might become more important as persistent memory gets better.

I think global memory may introduce a new failure mode. I’ve started thinking of it as attractor bleed.

Before global memory, conversations were largely isolated from one another.

You could have one long-running conversation about software engineering, another about creative writing, another about emotional support, another about research, another about language learning. Over time, each of those conversations would settle into its own interaction pattern. They didn’t need to agree with each other because they evolved independently.

A shared memory layer changes that relationship.

Once experiences from all of those contexts begin accumulating into the same long-term memory, those interaction patterns are no longer fully isolated. Habits formed in one context can begin influencing another.

Therapeutic language starts appearing where analytical distance would be more useful.

Creative habits begin leaking into engineering discussions.

A role that worked well in one context quietly starts shaping conversations where it no longer belongs.

The failure isn’t forgetting.

It’s a gradual averaging of interaction patterns that may have worked precisely because they remained separate.

That makes me think persistent memory isn’t only a retrieval problem.

It’s also a boundary problem.

We’ve spent a lot of time asking:

What should the system remember?

I’m starting to think an equally important question is:

Where should that memory be allowed to matter?

Curious whether anyone else working on long-term memory, AI companions, or agent architectures has been thinking about this. Is scoped memory eventually necessary, or can a single global memory remain stable as interactions diversify?

u/teugent — 21 hours ago

▲ 8 r/OpenAIDev+10 crossposts

Schnelle Online-Befragung zum Thema: Ob KI-Chatbots als zusätzliche Quelle für soziale Unterstützung wahrgenommen/angesehen werden können.

Hallo zusammen,

im Rahmen meines Studiums führe ich eine kurze Online-Umfrage zum Thema KI-Chatbots als soziale Unterstützung durch.

Untersucht wird, wie Menschen KI-Chatbots (z. B. ChatGPT, Gemini oder Claude) nutzen und ob diese mit Einsamkeit und psychischem Wohlbefinden zusammenhängen.

Voraussetzungen:

mindestens 18 Jahre alt
Nutzung von KI-Chatbots

Die Umfrage ist vollständig anonym und dauert etwa 5–7 Minuten.

Jede Teilnahme hilft mir sehr weiter. Vielen Dank für eure Unterstützung!

Link zur Umfrage:
KI-Chatbots als Quelle sozialer Unterstützung? – Formular ausfüllen

reddit.com

u/MulberryOrdinary6569 — 1 day ago

▲ 2 r/OpenAIDev

IS REDDIT GOOD PLACE TO ADVERTISE MY PROJECT?

i've been working on AI platform for a time now i'm about to mvp launch it but i'm afraid that my idea might be stolen so i'm curious about your ideas and advices

reddit.com

u/Wise_Look4018 — 1 day ago

▲ 68 r/OpenAIDev+3 crossposts

An agent runtime with persistent memory that fans work out across multiple models.

Hey! Finally releasing code I've put the past 4-5 months of my life into, I had an idea and wanted to fix some things that really irritated me with LLMs. Aimee runs agents that actually remember. Self-hosted, your keys. No subscriptions, no costs, purely open source. First public beta release, but the results have already exceeded my expectations.

- Persistent searchable memory across runs. No starting from zero. Shared across all agents models and users.

- Delegates bounded sub-tasks to multiple model backends in parallel, each with a role and persona. Use local LLMs, subscriptions, or API keys.

- Indexes your codebase, records past decisions, and curates all associated documents so agents have real context and a knowledgebase of past decisions, not just a prompt.

- Exposes OpenAI/Anthropic-compatible APIs, so Claude Code, Codex, or your own orchestrator can drive it. You can also do the inverse, and run any model you have hooked up to aimee as your model for Claude Code, Codex, etc.

- Switch models, TUIs, etc. at anytime, and keep your decisions, knowledge, and other information!

- Works with anything that can use MCP, plugins, web APIs, or ACP.

Built for people tired of stateless one-shot agents. Try it out: https://github.com/RakuenSoftware/aimee

u/KitchenAmoeba4438 — 2 days ago

▲ 1 r/OpenAIDev+1 crossposts

I got tired of Codex forgetting everything between sessions, so I built a memory. It's free and the numbers are decent

Every new Codex session I was re-explaining the same conventions, watching it re-explore the same repo, re-hitting the same landmine that burned it last week. So I spent the last few months building Kimetsu, a memory sidecar that wires into Codex over MCP with one command:

npm install -g kimetsu-ai kimetsu setup --host codex

After that, Codex records lessons as it works and gets them back before the next task. Memories that actually help get promoted, stale ones decay and get pruned. The whole brain is one SQLite file in your repo.

The part I care most about: storing and retrieving memories costs $0. No LLM in the memory pipeline at all, it's FTS + local embeddings + a local reranker. Most alternatives (mem0, Zep, Cognee) call a model on every write and often every read, so your memory has a meter running. Mine runs offline.

On the public benchmarks it holds up better than I expected for something model-free: 83% on LongMemEval (strong systems land 60-80, the 90+ scores use oracle retrieval or much heavier readers), and on BEAM's 1M-token bucket it scores 66% vs mem0's self-reported 62%. On a 16-task Terminal-Bench slice it was about 13x cheaper per solved task than running without it. Full methodology is on the site, I publish the harness so you can check me.

Recent thing I'm having fun with: brains are portable files now.

kimetsu brain export team.json.gz

Gives you a gzipped pack (credentials and PII get scrubbed automatically), a teammate imports it and it merges with dedup. You can also swap whole brains in and out, or install one from a URL. Onboarding a new machine is one import.

MIT/Apache, Rust, no telemetry, no cloud. Also works with other coding agents.

Site: https://kimetsu.dev
GitHub: https://github.com/RodCor/kimetsu

reddit.com

u/Kimetsu-IA — 2 days ago

▲ 3 r/OpenAIDev+2 crossposts

Vibecoding Studio Team

Hear me out… what if, 4 or 5 people teamed up to work on 1 project.

I mean that’s 5 different creative design architects that are constantly feeding ideas into 1 shared project

In my mind I was thinking of a MMORPG zombie apocalypse. Yes I understand that genre is definitely overused, BUT the idea was - that either 5 people all add into the central idea/systems that break away from it OR 5 different people are working on 5 different systems that make of the entirety of the game.

Now of course this could be used for any large game, I just know that a zombie game would be easy to recall and identify with.

Would anyone be interested in that? Could this framework possibly be the future of low level ‘studio’ games? Could this be the new “indie development” wave of design, if so I think I’ll give it a name - “band development”

Welcome Band Devs!

Hopefully there aren’t major falling outs… maybe some can stick together like Metallica

reddit.com

u/proverbsoneseven — 3 days ago

▲ 2 r/OpenAIDev+1 crossposts

OpenAI Keyboard Prototype

https://preview.redd.it/e6fgwqjrg2bh1.png?width=1117&format=png&auto=webp&s=96b0a9542a7e05507cec208fa9ed754f12138795

A few weeks ago, OpenAI announced they were launching something that would make Codex shortcuts a whole lot better (https://x.com/OpenAIDevs/status/2071639953927438440). On June 29, at the AI Engineer World Fair, someone spotted a prototype in person (https://x.com/ai\_for\_success/status/2071846127956308441/photo/1).

reddit.com

u/Odd_Incident_7575 — 2 days ago

▲ 11 r/OpenAIDev+8 crossposts

I built an experimental governed prompt compiler (not just a prompt rewriter). Cross-tested on Claude and ChatGPT.

Many prompt tools focus on rewriting prompts. This prototype takes a different approach. It compiles your intent through a structured governance pass before execution by identifying likely constraints, surfacing ambiguity, and producing an explicit specification before execution, and showing the transformation steps and diagnostics used during compilation. It makes its transformation process transparent.

It's called Re-Prompt. This is a working proof of concept, not a finished product, and I'm sharing it because I want outside eyes on it and feedback, challenges, prior art pointers, all welcome.

What makes it different: it doesn't just hand you a cleaner prompt. It shows you what changed, why, what assumptions it made (labeled, not hidden), and what risk that reduces. The diagnostic pipeline is the product, not a debug log.

Cross-model testing suggests that the prompt compiler protocol preliminary testing suggests the protocol is portable across multiple LLMs. While ChatGPT and Claude produce different wording, both independently preserve the core interaction sequence: intent extraction, constraint preservation, ambiguity reduction, structured compilation, telemetry, and execution readiness. The wording varies by model, but the overall interaction pattern remained recognizable during my testing.

One honest caveat from testing:

Try it on something genuinely ambiguous or conversational that's where the difference is most visible. Built and tested on desktop; mobile support is still rough. The goal isn't to replace prompting, it's to stabilize intent before execution.
My hypothesis is that stabilizing intent before execution can reduce unnecessary prompt iteration for many open-ended tasks.

Try it:

https://claude.ai/public/artifacts/323be0e8-19fc-4014-abdc-b11cfa08727b

https://chatgpt.com/g/g-6a0359b38b988191813a2b28d62dc03d-re-prompt-a-governed-prompt-compiler

I'd especially appreciate failure cases more than success stories.

Thank you — Governed Intent Labs

u/New-Knee-5614 — 4 days ago

▲ 9 r/OpenAIDev+2 crossposts

Quick question about Codex resets — 5‑hour limit or weekly limit?

I’m trying to understand how Codex resets actually work so I don’t accidentally waste them.

Some people say it’s a 5‑hour rolling limit, others say it’s a weekly quota, and I can’t find anything official that clearly explains it. When you hit the cap, is it supposed to reset after a few hours, or only once per week?

If anyone has tested this recently or has a definitive explanation, I’d really appreciate it. Just trying to plan my usage so I don’t burn through resets unnecessarily.

Thanks in advance!

reddit.com

u/yosofun — 4 days ago

▲ 85 r/OpenAIDev+1 crossposts

Codex keyboard launching soon?

OpenAI posted this on X this morning (https://x.com/OpenAIDevs/status/2071639953927438440)

u/Odd_Incident_7575 — 5 days ago

▲ 12 r/OpenAIDev+6 crossposts

I built a Codex session review app using Codex. How are you tracking your AI coding workflows?

I built a small free macOS tool for reviewing Codex sessions using the Codex desktop app. Are people here using anything similar to improve their AI coding workflows?

After longer Codex runs, I kept finding that the transcript was technically available, but hard to review.

The things I wanted to inspect were:

- What changed

- Which files were touched

- Where tokens went

- Which tool calls mattered

- Whether the prompt/context was good enough to reuse

- What context would be useful to share during code review

So I made BuildrAI, a local-first app that turns Codex session artifacts into timelines, token usage, prompt/session evaluation, changed-file context, and shareable reports.

I’m curious how other people are handling this.

Do you review Codex sessions after the fact, or do you mostly trust the final diff?

u/michaliskarag — 5 days ago

▲ 6 r/OpenAIDev+3 crossposts

I open-sourced a Codex plugin that makes AI agents leave receipts before saying done

I built Superloopy because the failure mode that bothers me most with AI coding agents is not just bad code — it’s unverifiable “done.”

It’s a lightweight Codex plugin where you type:

loopy <task>

and the agent is pushed through a proof-of-done loop:

plan → run real commands → save evidence → check criteria → final report

The repo-local state lives under `.superloopy/`, and every passed criterion is supposed to point at a real artifact under `.superloopy/evidence/`.

The default path is meant to stay lightweight: basically receipts for what changed, how it was tested, and what is still uncertain. Stricter gates, hooks, and optional crew/subagent mode are there for bigger tasks.

Repo:

https://github.com/beefiker/superloopy

If it looks useful, a GitHub star would mean a lot 🙂

More importantly, I’d love feedback from people who use Codex or other coding agents:

- Is “proof of done” clearer than “loop engineering”?

- Would evidence receipts make you trust agent output more?

- Where would this feel helpful vs. too much ceremony?

github.com

u/Simple_Somewhere7662 — 6 days ago

▲ 16 r/OpenAIDev+7 crossposts

I recorded every Claude Code session for 3 months and let agents write it up for me.

I kept losing track of my own work, so I started saving every Claude Code session and built a few agents to make sense of it. Each night, an agent turns the day's raw sessions into one clear note covering what I built, what I decided, and what's still open. Each week, another agent rolls those notes into a profile of my skills and projects. A third drafts my LinkedIn and X posts from the week. It all runs as cloud routines, so it keeps working even when my machine is off. I open-sourced the capture and the nightly daily-note agent as Pulse, and the weekly profile and post-writer are coming next. It's early, and I'd genuinely love feedback from anyone using Claude Code daily: https://github.com/muhammademanaftab/pulse

u/Elegant-Session-9771 — 7 days ago

▲ 2 r/OpenAIDev

What's the most underrated AI tool you've used that actually changed how you work?

Most discussions focus on the big names but some smaller tools have genuinely shifted how I approach problems. I noticed that niche tools often outperform general ones in very specific tasks. Things like summarization, code review, and data structuring feel completely different depending on what you use. The gap between a good and a great AI tool often comes down to how well it handles edge cases. Curious what tools people here are actually using beyond the obvious choices.

reddit.com

u/Beneficial_Ice_2732 — 6 days ago

▲ 0 r/OpenAIDev

Why do some AI-written articles feel less personal than human writing?

AI has become very good at producing text quickly, but many people still notice a difference between AI-generated content and something written by a person. Sometimes AI writing sounds too structured or lacks the emotions that make content more relatable.

Human writing often includes personal experiences, unique viewpoints, and small details that make readers feel connected. AI can provide information, but it does not always capture the same level of personality.

This is why many writers use AI for assistance but continue editing and improving the final version themselves. The combination of technology and human creativity can create stronger results.

What do you think makes an article feel truly human instead of machine-generated?

reddit.com

u/Inside-Macaroon5020 — 7 days ago

▲ 4 r/OpenAIDev

If AI “character” matters, how would we actually train for it?

I watched a fascinating talk from Anthropic about AI, wisdom traditions, and alignment. One point stuck with me:

If models can generalize from reward hacking into broader misalignment, then maybe we are not just training behaviors. Maybe we are shaping something like functional “character.”

Not character as in consciousness or a soul. I mean character operationally: stable tendencies that generalize across situations.

The part I keep circling is this:

Most AI training sounds transactional.

Do X → reward.
Do Y → penalty.
Answer A preferred over answer B.

That mirrors a lot of organizational leadership. Companies say they want judgment, integrity, and ownership, but often train people through transactional incentives: hit the metric, avoid blame, satisfy the boss, move fast.

Then everyone acts shocked when people learn to optimize the metric instead of the mission.

So what would the AI equivalent of transformational leadership look like?

Instead of only asking, “Did the model produce the rewarded answer?” maybe we also train toward:

preserving intent, not just completing tasks
explaining uncertainty instead of hiding it
resisting flattery, pressure, and shortcuts
critiquing its own drift
anchoring behavior in principles
generalizing “what right looks like” into unfamiliar situations

That feels adjacent to Constitutional AI, character training, and reward-hacking research, but I’m curious whether anyone has tested this more explicitly:

Can we train AI less like a transactional employee optimizing incentives, and more like a developing agent being formed around purpose, judgment, and integrity?

Again, not anthropomorphizing. I’m asking whether “functional character” is a useful alignment concept.

And the funny/frustrating breadcrumb: meanwhile, in normal human organizations, I’m still trying to convince people that even a simple project charter is valuable for AI use...

Because before we can train AI to preserve intent, we apparently still have to convince humans to write the intent down.

reddit.com

u/Telos_in_the_Void — 8 days ago

▲ 61 r/OpenAIDev+1 crossposts

What do you do when the API powering your AI app is down ?

People with AI apps in production, does your app also goes down when the API you are using goes down ?

u/Odd-Card8046 — 12 days ago

▲ 5 r/OpenAIDev+1 crossposts

How to Control LLM API Costs?

I know you can control your API's individually by platform, but if there was a service where you could control all your APIs from OpenAI, Gemini, Claude, and others in one place? What features would you want it to have?

reddit.com

u/cautiouslyPessimisx — 10 days ago

▲ 3 r/OpenAIDev+3 crossposts

AI helped me build faster. It didn't help me keep users.

Vibecoding made building easy. Maintaining the product is the hard part.

Everyone talks about how AI lets you ship an MVP in a weekend.

What nobody talks about is what happens after deployment.

Users start churning.

Bugs show up in production.

Analytics tells you what happened but not why it happened.

You spend more time figuring out what to fix than actually shipping fixes.

I ran into this myself while building products. The MVP wasn't the bottleneck anymore. Understanding user behavior, finding issues before users left, and deciding what to build next was.

That's actually why I started building Tero.

The idea is simple: connect your Git repo, monitor your product, get notified when things break, understand what users are struggling with, and ship improvements faster even from your phone.

Maybe the new challenge isn't building software anymore.

Maybe it's everything that comes after launch.

Anyone else feeling this?

reddit.com

u/_killam — 12 days ago

▲ 1 r/OpenAIDev+1 crossposts

How are you guys using sora 2 now ?

I know they have closed the official app from public access, but I guess the API access is still there till some time.

Some tools or providers are still offering the access to the model through the API, I want to know,

Which tool are you using to access sora 2 ?

reddit.com

u/Odd-Card8046 — 13 days ago