u/Glittering-Young8692

▲ 6 r/AIDiscussion+1 crossposts

AI Alignment: Can we trust the reasoning behind the AI task?

I’ve been reading up on AI alignment lately. This article was one of the more insightful/unsettling things I’ve read.

Anthropic is studying cases where models can appear aligned during training but behave differently under the hood. Not “evil AI” stuff, but more like models learning what gets rewarded.

There's a danger of adopting systems that sound trustworthy long before we understand why they behave the way they do.

Conversations will likely shift from:
“Can AI do the task?”

to:

“Can we trust the reasoning behind the AI task?”

Anyway, genuinely fascinating read: https://www.anthropic.com/research/teaching-claude-why

u/Glittering-Young8692 — 11 days ago

Anyone read today's Microsoft story in VentureBeat? Interesting framing on “Shadow AI” becoming an enterprise risk as agents start acting on behalf of users:

(https://venturebeat.com/technology/microsoft-takes-agent-365-out-of-preview-as-shadow-ai-becomes-an-enterprise-threat?utm_source=Iterable&utm_medium=email&utm_campaign=VBDaily-Iterable).

Microsoft is saying companies already [unknowingly] have AI agents running across tools, endpoints, and SaaS, most of which aren’t governed or visible to IT.

Feels like Shadow Analytics, except now the “apps” can take action.

How worried is your dept? What are seeing at your org?

reddit.com
u/Glittering-Young8692 — 18 days ago

I was at a conference earlier this month, speaking with IT leaders, and I brought up this Shadow AI scenario:

A pharma company with controls on its customer data: CRM access restricted, data warehouses secured, compliance processes in place. A senior account rep works for a pharma company with tight controls on customer data. The rep joins a customer call on Zoom. S/he activates an AI note-taker or uses an AI transcription tool. The customer hears, "This Zoom call may be recorded," and agrees, but s/he isn't thinking about the AI notetaker transcribing the conversation in the background. Now product roadmaps, future clinical plans or other NDA info is sitting in a third-party system that the pharma company and customer account did not approve.

One of the pharma IT leaders I was speaking with looked shocked. I guess this scenario hadn't crossed his mind.

This reminded me of Shadow Analytics 20 years ago. Employees got impatient waiting for SAS and Cognos reports, so they took data out of company databases and started analyzing it in Excel and shipping spreadsheets around. This led to data duplication, version chaos, and decisions made on outdated figures or data.

The pattern is similar, but AI risks are much greater: 1) data exposure at scale 2) misguided interpretations/decisions outside established processes.

Curious to hear how your companies are responding to these types of Shadow AI issues? Are they shutting down AI use? Are they waiting to see what happens or until the first security issue?

reddit.com
u/Glittering-Young8692 — 23 days ago

What’s happening inside companies right now feels very familiar.

A decade ago, I witnessed the Shadow Analytics crisis. Employees didn't want to wait for IT reports from SAS or Cognos, so they pulled corporate data into Excel sheets. It worked until data got corrupted, out-of-date, leaked, etc. We spent years unwinding that mess.

AI is following a similar pattern. I'm seeing employees using unauthorized AI tools to summarize meetings or analyze spreadsheets. Employees win 10+ minutes of productivity, but the company loses:

  1. Security: Proprietary NDA company/customer/partner info is captured in 3rd-party AI models that the company doesn't own.
  2. Recorded Process: If an AI makes a logic call, and that isn't logged or repeatable in a company system, your business logic or decision process isn't captured.

In my experience, the fix isn't "banning" the tools (that failed in 2010). The fix is defining where AI belongs in the actual workflow.

Is your org setting guidelines, or just letting employees 'Shadow AI' until something leaks?

reddit.com
u/Glittering-Young8692 — 24 days ago