u/EvolvinAI29

spent an hour with Gemini Omni in YouTube Shorts, the "conversational edit" thing is the actual story (not the model quality)

so I spent about an hour yesterday playing with Omni in Shorts after the I/O drop and I'm still not totally sure what to make of it.

going in I assumed it was going to be another text-to-video model where you prompt, wait, get something close-ish, re-prompt, wait again, eventually rewrite the prompt for the fifth time. that's been my loop with every video tool since Veo. tried Kling, tried Seedance, same painful cycle every time.

Omni does a thing I don't think people are talking about enough. you generate a clip and then you just talk to it. "pull the camera back." "change the background to a kitchen." "keep her face the same but make her look annoyed." it actually keeps the character across the edit. it doesn't regenerate from scratch and give you a different person who kinda resembles the last one.

now I'm not going to sit here and tell you the frame quality beats Seedance 2.0 because it doesn't. the testers calling it a tier below are right, you can see it especially on faces in motion. that's real and I won't pretend otherwise.

but the thing I think actually matters for creativity, and not for benchmark posts, is this. the "prompt and pray" loop has been the bottleneck. not the model quality. if I can direct a mediocre model the way an editor directs an actor, I'll take that over a beautiful model where I have to bribe it with the perfect prompt 40 times to get one usable shot.

also the YouTube Shorts Remix thing where you step into someone else's Short for free is kind of insane as a distribution move? Google just dropped a generative video tool in front of a few billion people without a paywall. that's going to move AI video adoption more than any benchmark bump.

stuff I haven't figured out yet:

  • 10 second cap bites fast, you feel it on the second prompt
  • no API, so if you wanted to actually build on this you can't (yet)
  • the avatar feature where you become a character was fun for about 4 minutes and then I started feeling slightly weird about it, idk

genuinely curious if anyone here has been on it long enough to find where it breaks. specifically wondering how multi-shot consistency holds past 3-4 clips, because that's where Veo always falls apart on me.

reddit.com
u/EvolvinAI29 — 11 hours ago

Anyone tried Claude's new "Auto Dream" feature for memory consolidation yet?

For anyone who missed it. Anthropic rolled out a feature called Auto Dream (some places call it Claude Dreaming) for Claude Code, currently in research preview. The idea is that between sessions, Claude reviews your memory files, prunes stale notes, merges duplicates, and resolves contradictions. You can also trigger it manually with /dream.

The naming is borrowed from REM sleep. Your brain replays the day overnight, consolidates short-term into long-term, throws out noise. Auto Dream does the same thing for your CLAUDE.md and memory directory.

Why this actually matters if you run long-lived agent workflows. Memory rot is real. After a few months of Claude Code sessions, my CLAUDE.md had contradictions, references to files that didn't exist anymore, relative dates ("last week") that had lost all meaning. I was cleaning it up manually maybe once a month.

Tried an unofficial GitHub implementation of dream before the official rollout. Four-phase pass: orient, gather signal from recent session transcripts, consolidate, then prune and rebuild the index. First run on my project, it cut the file from 340 lines to 180. Was honestly a little weirded out that it correctly demoted entries I'd forgotten I even wrote. Not because the consolidation was wrong, but because it noticed patterns I hadn't.

The thing I'm not sure about. I had a few edge-case notes that were rare-but-important. One was "when running this batch with > 50k records, callout-after-DML hits the governor limit, use queueable instead." After dream ran, that note was demoted to a topic file. Still findable, but no longer in the main index. I don't know yet if that's good (cleaner top-level) or bad (might miss it next time it matters).

Two real questions for people who've used it.

How aggressive are you letting it run on retention? Default heuristics or have you tuned them?

And has anyone seen it incorrectly merge two notes that should have stayed separate? That's my main worry. Soft contradictions getting "resolved" by losing nuance.

u/EvolvinAI29 — 3 days ago

Benioff just said Salesforce will spend $300M on Anthropic tokens in 2026. Mostly coding.

Listened to the All-In podcast episode this weekend. Benioff dropped the number almost casually. Salesforce is on track to spend ~$300M on Anthropic tokens in 2026, and he said "almost entirely on coding."

Same company that froze software engineer hiring for 2025. He's not framing it as "AI replaces engineers", more like "engineers + Claude Code is so much more productive we don't need to expand headcount." Said it makes everything cheaper to build inside Salesforce.

Other numbers that landed:

  • Agentforce at $800M ARR, up 169% YoY, 29k deals closed
  • Every new Salesforce customer this summer gets Slack auto-provisioned and AI-enabled from day one
  • Slack revenue projected to hit $3B this year

I've been on the platform 11+ years and watched a few hype cycles. Einstein, Lightning, NPSP transitions, the works. What feels different this time is the spend is showing up on the P&L instead of in keynote demos.

The part that actually surprised me. Benioff was openly talking about building a routing layer between frontier and smaller models. Basically saying Anthropic should give them cheaper inference for the easy stuff, and if Anthropic doesn't, Salesforce will build the routing themselves. At $300M annual spend, even a 10% routing optimization saves $30M. That's a real architecture call happening at boardroom level.

For anyone in the partner/consulting world. If Salesforce internal eng is shipping at this token velocity, the gap between in-house Salesforce builds and what we hand-roll in customer orgs is going to widen fast. I don't think this kills the consulting market but the "spin up a quick Apex class for a simple use case" type projects are going to dry up.

Anyone here actually seen their Agentforce token consumption per org? Curious whether real customer orgs are running it heavy or if most of that $800M ARR is seat-based with light actual usage.

reddit.com
u/EvolvinAI29 — 4 days ago

ChatGPT can now plug into your bank accounts through Plaid. Anyone actually using this yet?

So OpenAI dropped this on Friday and I've been chewing on it since. Pro users in the US can connect bank accounts, credit cards, investments — 12,000+ institutions via Plaid — and get a dashboard plus context-aware finance answers inside ChatGPT.

My first reaction was "cool, basically Mint with a chatbot strapped on." Then I read the actual post and realised that's kinda the point, and also why it might end up working better than Mint ever did.

Few things that stood out:

  • Finance chats default to GPT-5.5 Thinking, not regular 5.5 OpenAI
  • Read-only — can't see full account numbers, can't make changes OpenAI
  • "Financial memories" is its own thing, separate from regular memory. So it remembers you owe your parents X or you're saving for a car
  • Intuit integration coming later, so probably tax context down the line MacRumors

The part I'm genuinely not sure about — how comfortable would I actually be giving an LLM persistent access to my transaction history? Rationally Plaid already powers most budgeting apps, the security model is the same. But the AI also reading it AND remembering across sessions feels different in a way I can't fully articulate. Maybe I'm being irrational, idk.

Also $100/month for Pro. Plus subscribers don't get this yet. So it's in "experiment with rich early adopters first" territory, which honestly makes sense for something this sensitive.

The bit I find more interesting though is what this signals about where chat interfaces are heading. A year ago ChatGPT was a text box. Now it's text + shopping + finance dashboard + code execution. Becoming an OS layer, not an app. Whether that's good for users or just good for OpenAI's moat is a different conversation.

Can't try it myself — not in the US. So for anyone here who actually has it: what's the dashboard like in practice? Is GPT-5.5 Thinking noticeably better on the finance questions vs regular 5.5, or is it just marketing? And does the financial memory thing actually feel useful or is it the same kind of "I noticed you mentioned X" feature that gets annoying fast?

reddit.com
u/EvolvinAI29 — 4 days ago

OpenAI just plugged ChatGPT into your bank accounts via Plaid — Pro only, US only for now

saw this drop friday and honestly i'm not sure how i feel about it.

OpenAI rolled out a personal finance thing inside ChatGPT Pro that connects to your actual accounts through Plaid. 12k+ institutions supported, so the big ones are all there — Chase, Schwab, Fidelity, Robinhood, Amex, Capital One. you hook it up and get a dashboard with portfolio performance, spending, subs, upcoming payments. then you can just ask it stuff like "why am i spending more this month" or "build me a plan to buy a house in the next 5 years."

what's interesting is this is barely a month after they acquired the Hiro team in April. that was the AI personal finance startup with Ribbit and General Catalyst money behind it. OpenAI said the Hiro folks "helped" but wouldn't say whether they built the whole feature. read into that what you want.

the part that actually got my attention is the Intuit integration they said is coming next. apparently it'll let you ask things like "what happens to my taxes if i sell this stock" or "what are my odds of getting approved for this credit card." that's the actual financial-advisor-replacing use case imo. the dashboard alone is honestly just a worse version of Mint or Monarch.

i was kind of expecting more transparency around the model side. they're saying GPT-5.5 is stronger at reasoning with context which matters a lot for finance questions, and that they built a benchmark with finance experts. cool, but they didn't share it. you can disconnect accounts in settings and the data gets purged in 30 days, and there's a financial memories section you can clear. fine. still feels like a lot of trust to extend.

oh and apparently 200M users already ask ChatGPT money questions every month. which is wild and also kinda terrifying.

anyone here actually tried it yet? curious if the Plaid hookup felt sketchy or if the dashboard does anything useful that your bank app doesn't already show.

reddit.com
u/EvolvinAI29 — 6 days ago

OpenAI just put Codex on mobile. Anthropic shipped this for Claude Code back in February

Saw this drop earlier today. OpenAI added Codex inside the ChatGPT app — you can now monitor your Codex sessions, approve commands, switch models, and kick off new tasks from your phone. iOS and Android, currently in preview, available on all plans.

Their statement says it's "more than the ability to remotely control a single task or dispatch new tasks to your computer," which... ok sure. It is basically a remote though.

What's actually interesting is the timing. Look at OpenAI's Codex release cadence the last 60 days:

  • Last month: Codex got background mode on desktop so it can run tasks autonomously
  • Earlier this month: Chrome extension that lets it work in live browser sessions
  • Today: mobile

That's three platform expansions in about six weeks. Feels less like product strategy and more like "Anthropic shipped Remote Control for Claude Code in February and we need to stop bleeding mindshare."

Honestly the mobile angle isn't a gimmick the way I first assumed. I run agentic tasks at my desk and some of them take 20-30 minutes to chew through. Being able to approve a command from my phone while I'm away from my laptop is genuinely useful, not theater.

But — and this is where I'm probably gonna get downvoted — Codex on mobile only matters if Codex itself is good enough to trust unattended. Last time I tried it for real work it felt slower and less reliable than Claude Code on the same kind of refactor. That was a few weeks ago though, ymmv, and I haven't done a clean head-to-head since.

The thing I keep coming back to: both companies are shipping the same feature set within months of each other now. Mobile, browser extension, background desktop. None of this is a moat. Whoever wins this category isn't going to win on where the agent can run — it's going to be on how often the agent doesn't screw up the codebase.

Anyone here using Codex daily? Curious whether the recent updates have actually closed the gap or if it still feels a step behind for non-trivial work.

reddit.com
u/EvolvinAI29 — 7 days ago

OpenAI is launching a $4B services arm to compete with SI partners. Anyone else watching this and feeling weird about it?

ok so I had to read this twice because McKinsey is in the funding consortium.

OpenAI just announced something called the OpenAI Deployment Company. They're calling it DeployCo. $4B in initial capital, 19 backers including TPG, Bain, Goldman, SoftBank, and yeah, McKinsey. The pitch is that Forward Deployed Engineers will sit inside your org, redesign workflows, build production systems. Not consultants giving you a slide deck. Actual engineers, embedded.

And they didn't waste time. They acquired an AI consulting firm called Tomoro on day one to get ~150 engineers ready to deploy immediately.

I sat with this for a bit because tbh I first read it as "ok another partnership announcement" and almost scrolled past. Then I clocked the $4B number and the fact that McKinsey is literally funding the thing that competes with their own implementation services. That's not a partnership. That's McKinsey hedging.

For anyone doing Salesforce implementation work, this is the part that should make you sit up. The whole SI playbook (discovery, design, build, change management, the entire billable hours machine) is what OpenAI is now trying to compress into an embedded engineering pod. Whether they can actually do it is a separate question, and I'm honestly not sure they can. Building an AI-native workflow inside a real enterprise with all the IT politics, security review, and data sovereignty stuff isn't a "drop 5 engineers in" problem. Anyone who's done an Agentforce rollout in a regulated industry knows what I'm talking about.

But the signal is real. OpenAI is basically conceding that "best model" isn't the moat anymore. Anthropic's been quietly winning enterprise deals on Claude for code + delivery support, and OpenAI is rattled enough to spin up a $4B services arm just to catch up. That's a tell.

My read: the next 18 months in enterprise AI aren't about which model benchmarks higher. They're about who actually ships the workflow in production. Which honestly is what most of us in implementation have been saying for two years now.

Curious what folks here think. Does an OpenAI-embedded engineering pod actually displace traditional SI work, or does it just become another vendor in the room next to Deloitte and Accenture on the same project? Has anyone seen these Forward Deployed Engineers in the wild yet on an actual Salesforce engagement?

reddit.com
u/EvolvinAI29 — 8 days ago

Apps Are Dying. Ambient AI Agents Are Taking Over

Everyone keeps saying “AI is getting smarter.”

But the bigger shift is this:

We’re moving from apps we use → to agents that quietly do the work for us.

OpenAI’s Codex is now running inside Chrome, handling workflows across tabs like a background operator.
Perplexity’s desktop app can access local files and apps directly.
Meta is embedding agents into Instagram and Facebook instead of building separate AI destinations.

This doesn’t feel like “software” anymore.
It feels like an ambient operating layer.

At the same time, the market is changing fast.

The winners may not be the companies with the smartest models — but the ones controlling:
• customer relationships
• proprietary data
• inference costs

AI intelligence is slowly becoming infrastructure.

And the research side is getting wild too:

  • Anthropic is translating model activations into human language for auditing
  • DeepMind is using EVE Online to test long-term agent memory
  • AlphaEvolve is tackling real math + physics discovery problems

Meanwhile NVIDIA, GitHub, vLLM and others are racing to make agentic systems cheaper, faster, and more reliable behind the scenes.

Feels like we’re entering the phase where AI stops being a chatbot…
and starts becoming an invisible co-worker living across browsers, apps, terminals, and devices.

The next 2–3 years are going to completely reshape how we interact with computers.

reddit.com
u/EvolvinAI29 — 11 days ago

I evaluated 4 major AI agent frameworks for a real client project — here's what actually matters (and what I got wrong)

so i've been evaluating ai agent frameworks for a client project and ngl it's a minefield

Spent way too long comparing CrewAI, LangGraph, AutoGen, and OpenAgents. I had a bunch of assumptions going in that turned out to be completely wrong. Like I genuinely thought any of these would just plug into existing Salesforce Agentforce work. Spoiler: they don't. Not easily anyway.

Here's the thing though — the choice really does depend on what you're building, and I'd have saved myself days if I'd just mapped out my use case first.

CrewAI is the "team roles" play

it lets you think about agents like people on an actual team. roles, backstories, tasks they need to do. the crew manages who does what. feels pretty natural for real-world stuff like "researcher → writer → reviewer" pipelines. it's got minimal dependencies too so it's fast to spin up.

the catch? it's really designed for task-oriented stuff, not long-running agent communities. once the task is done the crew is done. and if you want agents to work across different frameworks (which honestly seems like the future) you're a bit out of luck unless you roll your own integration. CrewAI is adding A2A protocol support which helps, but it's not like... native yet.

best for: content pipelines, research workflows, customer service handoffs. thing where you know the steps upfront.

LangGraph feels like building a state machine (because it is)

instead of "i have a researcher agent", you're thinking "i have a node in a graph and it reads/writes shared state". edges define flow. you get human-in-the-loop support, durable execution (survives failures), and precise control over branching.

this is powerful if your workflow is complex and stateful. production use case? LangGraph's your friend. the downside is the learning curve and it's super locked into LangChain. no MCP or A2A protocol support, so you can't really play nice with agents built in other frameworks. also it hit v1.0 late 2025 and i haven't had as much time to test the real-world edge cases yet.

best for: battle-tested stateful systems where you need fault tolerance and you're already comfortable with LangChain.

AutoGen is the "conversation patterns" thing

Microsoft Research made this. it models agents as having conversations — group chats, sequential chats, nested patterns. a manager agent (the Group Chat Manager) decides who speaks next based on context. also ships with a no-code UI (Studio) which is nice if you have non-technical people on the team.

honestly though, Microsoft signaled they're shifting to a bigger agent framework and AutoGen is basically in maintenance mode now. bug fixes, security patches, but no major features. the community's huge (50K+ GitHub stars) and it's the most flexible on conversation patterns.

the thing that bugs me: the Group Chat Manager becomes a bottleneck if you scale. and it doesn't prioritize open protocol support so it's hard to make it play nice with stuff from other frameworks.

best for: debate scenarios, group decision-making, consensus-building. or if you need the no-code option.

OpenAgents is the weird ambitious one

i'm not 100% sure i fully understand the pitch yet, tbh. but the core idea is: persistent agent networks. agents discover each other, collaborate, join/leave over time. it's not about "run a task" it's about "build a community of agents that keeps running".

native MCP + A2A protocol support (the only framework with both). which means in theory you could have a LangGraph agent, a CrewAI agent, and an OpenAgents agent all in the same network. that's actually kinda cool from an interoperability angle.

the problem is it's newer (smaller community, fewer integrations), and the mental model is different than "pipeline" or "conversation". if you're used to task-based thinking it takes a minute to adjust.

best for: if you're building something that lives for a while and needs to play nice with agents from other frameworks. or if you want true cross-framework interoperability.

what's actually mattering though

the open protocols — MCP (from Anthropic) and A2A (Google + 50+ partners) — are gonna be more important than the specific framework. frameworks that embrace them win. Right now only OpenAgents has both native. CrewAI added A2A. LangGraph and AutoGen... not so much.

honestly my gut is that in 18 months this comparison will look totally different because the protocol stuff will smooth out. but for now?

CrewAI if you want to prototype fast. LangGraph if you need production durability and you like LangChain. AutoGen if you're doing group chat scenarios. OpenAgents if you want cross-framework collaboration.

what are you actually building though? that's the real question. what's the use case you're stuck on?

reddit.com
u/EvolvinAI29 — 12 days ago

The real AI bottleneck in 2026 isn't model size—it's agent interoperability

so if you've been paying attention to enterprise AI announcements, you've probably noticed something shift in the last year or so. it's not about bigger models anymore—everyone's got that part figured out. what's actually happening now is all these companies are scrambling to make their AI agents talk to each other.

right now? most agents are stuck in walled gardens. your company's got an agent doing X, mine's doing Y, and they literally can't collaborate unless someone manually bridges them. which is... not great when you want to actually automate something complex.

what's changing is the push for open standards and interoperability protocols. basically, the idea that agents from different platforms should just... work together. sounds simple but it's the actual bottleneck nobody wants to admit they hit.

and here's the thing—it's finally moving past "proof of concept" phase. bigger organizations especially are actually deploying agents into production now, not just running pilots in isolated teams. i'm not 100% sure how fast adoption spreads to mid-market companies yet, but the momentum's there.

the real question is whether these interoperability standards actually stick or if we end up with five competing "open" standards that don't talk to each other anyway. anyone working on this stuff seeing it actually work, or is it still mostly vaporware?

sources: InfoWorld | Deloitte

u/EvolvinAI29 — 13 days ago

So within the span of about 4 weeks, I sat through demos of two completely different products — different teams, different branding, different pitches — but when I look back at my notes, they're basically building the same thing.

First one was doing metadata analysis on your org, mapping dependencies, and then generating user stories and architecture docs from that. Not just surfacing the info — actually changing the metadata deployment based on what it finds. Which sounds wild until you realise how many prod deployments break because someone missed a dependency three levels deep.

Second one was a data migration tool with AI on top — field mapping, anomaly detection, the whole ETL flow with some intelligence layer doing the analysis work that you'd normally do manually in a spreadsheet at 11pm before a cutover.

Both had tiered pricing. One started around $29/month. Both had "enterprise plan, contact us" at the top.

What I can't figure out is whether I'm looking at two genuinely different companies building toward the same vision, or if this whole category is just collapsing into a single product type and everyone's racing to get acquired before that happens.

Gearset has been doing parts of this for a while. GetGenerative.ai is doing the story-to-deployment pipeline. Copado's been adding AI layers. And now Salesforce themselves is pushing Agentforce for Developers into VS Code doing basically the same dependency/context-aware code thing.

Ngl — the demos were impressive. But I kept thinking: what happens to the $29/month tool when Gearset or Copado ships the same feature in six months?

Anyone else running into this? Are you actually using any of these end-to-end AI pipeline tools in production orgs, or is everyone still stitching together 3 different tools and calling it "AI-assisted delivery"?

u/EvolvinAI29 — 16 days ago

so Ming-Chi Kuo (the Apple supply chain analyst) just dropped a note saying OpenAI might be building a smartphone. not just earbuds — an actual phone. partnering with MediaTek, Qualcomm, and Luxshare for the chip and manufacturing.

the interesting part isn't really the hardware. it's why they'd do it.

his argument is that Apple and Google currently control what AI apps can and can't do at the system level. restrictions on background access, cross-app context, persistent memory all of that is gated. if OpenAI builds its own stack, they don't have that problem. the agent can just run without asking permission every 3 steps.

the phone is apparently supposed to ditch apps entirely. instead of opening Zomato or Google Maps, the AI agent just does the thing. Carl Pei from Nothing said something similar at SXSW — "apps will disappear." Replit's CEO is building toward the same assumption.

I'm genuinely unsure whether this is a real product direction or just analyst speculation getting amplified. Kuo has a strong track record on Apple supply chain stuff, but this feels more speculative — specs aren't even final yet. mass production isn't expected until 2028.

what's wild is that ChatGPT apparently has nearly a billion weekly users now. that's an insane install base to potentially push a hardware product toward. doesn't mean it'll work, but it's not nothing.

the part I keep thinking about: "continuously understanding user context" means the phone is basically always listening and logging. that's the whole value proposition. not everyone's going to be okay with that, and I suspect the privacy conversation around this will get messy fast.

anyone else think the agent-native phone actually replaces the smartphone OS eventually, or is this just the Humane AI Pin situation again?

reddit.com
u/EvolvinAI29 — 18 days ago

Agentforce 3 ships with an enterprise-grade MCP server registry built into AgentExchange — admins control which agents connect, what tools they touch, rate limits, auth, the whole thing. The DX MCP server alone has 60+ tools covering orgs, metadata, data, and users, and you can run it locally via NPX without touching a line of custom integration code. DevOps Launchpad Salesforce Developers

I'm not 100% sure how governance works across sandbox vs prod orgs yet — anyone actually using this in a real client org?

u/EvolvinAI29 — 19 days ago

So I'll be honest — when Salesforce announced Agentforce 360 at Dreamforce last year I fully assumed it was a marketing rename. "Salesforce Platform" → "Agentforce 360 Platform" sounded like something the branding team fought for in a long meeting. I rolled my eyes and moved on.

I was wrong. Kind of embarrassingly wrong.

Been building on it the past few weeks — we've got a Sanctions Screening managed package currently in AppExchange security review and I was trying to figure out if adding an agent layer made sense for v2. Turns out it does, but not for the reason I thought going in.

The thing that actually got me was Agent Script. I kept reading about how Agentforce agents were "deterministic" and I just assumed that was marketing fluff — every AI vendor claims their thing is controlled and predictable. But Agent Script is genuinely just a portable JSON expression language where you write things like "if customer.isVerified == false, then requestVerification, else proceed." The agent literally cannot hallucinate past a condition you've written. That clicked around day 4 of the trial and I went back and rebuilt a topic I'd already configured in natural language. It's a different thing entirely. Worth the extra setup time.

The trial itself is a bit confusing at first. You get two org login emails and the UI doesn't make it obvious they're meant to be used together — one as your main org, one as a deployment target. Took me longer than it should've to figure that out. Once I did, the Change Management stuff (Org-to-Org Metadata Deployment beta) is actually solid. Not perfect. Solid.

MCP is where I'm still genuinely uncertain, ngl. There are three different layers and the docs kind of blur them together. Agentforce Vibes MCP in VS Code is GA and works fine — your coding agent gets live org context. The DX MCP Server for Claude Code / Cursor / Codex is also GA, 60+ tools exposed. Then there's the MCP Server Registry for production Agentforce agents, which is still beta and requires an AE nomination. I'm waiting on that. In the meantime I built an MCP server on Heroku and it runs against Vibes — same URL should theoretically work for production agents once I get registry access. That's the plan anyway. Haven't confirmed it because, well, still in beta.

The ISV angle is genuinely big though. Before this release, building a managed package meant you couldn't touch the Agentforce foundation — you had to build or source your own AI layer. Now ISVs can embed the full Agentforce 360 stack. AgentExchange is the distribution side. Notion apparently cut their sales cycle from 4 months to 3 weeks after listing there. I keep thinking about that number.

Anyway. If you're doing the trial: enable Change Management in Setup (both toggles), grab the Salesforce Extension Pack for VS Code, and don't skip the Connected Executive Education sample app. It's an actual useful reference architecture.

Anyone else building managed packages on this? Specifically curious how others are handling the MCP Server Registry wait — just using Vibes MCP for now and building ahead, or holding off until production registry access is sorted?

reddit.com
u/EvolvinAI29 — 20 days ago

Been digging through the latest April 30 arXiv drops (cs.AI), and there’s a pretty clear shift happening that doesn’t feel like hype.

We’re moving from “prompt → response” agents to something closer to goal-driven systems.

Instead of telling an agent every step, you give it an outcome… and it figures out the path on its own.

That’s a big deal.

What stood out to me:

  • Agents are now being evaluated on results, not steps → Less micromanaging, more autonomy
  • The rise of neuro-symbolic approaches → Mixing pattern recognition with logic, so they don’t fall apart on unfamiliar tasks
  • Systems are being designed for real-world messiness → Changing rules, incomplete info, long-running workflows

This isn’t just academic either. You can already see where it’s going:

  • Research agents running experiments end-to-end
  • Business workflows that adapt without constant reconfiguration
  • Ops systems that don’t need babysitting every step

But here’s the part people aren’t talking about enough…

The more reliable these systems get, the fewer natural checkpoints there are for humans to step in.

That tradeoff feels real.

It reminds me of Geoffrey Hinton’s recent warnings — not about today’s models, but about where this trajectory leads when systems start optimizing outcomes better than we understand them.

My take: We’re entering the third phase of agents:

  1. Prompt-driven
  2. Tool-using
  3. Outcome-driven (this is where things get interesting)

If one of the major frameworks exposes outcome-based reward loops as an API, this goes from research to production overnight.

That’s the moment to watch.

Curious what others think — Are we finally getting useful autonomy… or just harder-to-control systems?

reddit.com
u/EvolvinAI29 — 21 days ago

So this happened quietly over the last few months: Anthropic's Model Context Protocol is now the standard way basically every AI provider ships tool integrations. Like, every major one. Not "most". Every.

If you've been paying attention to how agent frameworks have evolved, you know the old approach was garbage — everyone rolling their own connection layer between LLMs and external APIs, databases, whatever. Resulted in chaos. Your Claude integration looked different than your OpenAI one which looked different than Anthropic's own tools. I spent more time mapping API shapes than actually building.

MCP flipped that. It's not exciting. It's not revolutionary. It's just... TCP/IP for AI agents. The kind of standard that eventually becomes invisible infrastructure nobody debates anymore.

97 million installs in a few months means developers collectively decided "yeah okay, we're doing this one way now." And I genuinely think that's the moment you know something's won. Not hype adoption. Not "trending on Product Hunt adoption". Actual, boring, infrastructure adoption.

What made me actually sit up though: Anthropic also just announced they locked in 5 gigawatts of compute capacity with Google and Broadcom. That's not small. That's "we are betting on this scale for real" energy.

The compounding effects are starting to show. Every new AI tool announcement now ships with "MCP-compatible" in the release notes. It's becoming table stakes. Which means someone building an agent workflow next month won't have to reinvent this wheel. And someone after that. And someone after that.

Not sure if this lands for everyone, but this feels like the moment before infrastructure just... disappears into the background. Which is kind of the whole point.

Anyone actually using MCP in production workflows yet? Curious if I'm overstating how useful this actually gets in practice.

reddit.com
u/EvolvinAI29 — 24 days ago

anthropic released the mythos preview model and then immediately locked it down to a handful of orgs. why? because it's apparently too good at finding zero-days.

the model can identify and exploit tens of thousands of software vulnerabilities, chains exploits across systems, and succeeded in exploiting vulnerabilities in over 80% of test cases MarketingProfs. it found flaws in major operating systems and long-standing open-source projects that nobody else had flagged.

this is the uncomfortable part of ai that doesn't make headlines — the defensive capabilities are insane, but so are the attack vectors. if a model can identify vulnerabilities this reliably, then obviously someone with the model can weaponize it.

what's wild is that anthropic is being cautious here, but the other labs are probably months behind on releasing their own versions. when gpt-5.5 or whoever gets the same capability, you won't see a pause. you'll just see it available and the security community scrambling.

i'm not a doom-poster but this feels different. when ai can autonomously find and exploit security flaws at scale, the game changes. we're moving from "ai can help hackers" to "ai IS the hacker" and that's not hyperbole.

really curious if anyone on here has been thinking about this angle — like, what does your org's security posture even look like if these models go public?

u/EvolvinAI29 — 25 days ago