u/kamilc86

How do you do OOD detection on a closed LLM API with no latent access?

Classical OOD detection assumes you can see the model. Mahalanobis on features and energy on logits are typical, and both require cracking the model open.

With closed LLM APIs you get text in, text out, and maybe top K logprobs per token if you are lucky. The methods that survive that constraint are sampling consistency like SelfCheckGPT, token level entropy on whatever logprobs the API exposes, proxy embeddings from your own encoder, or a separate verifier model on the output. What is bothering me is that classical OOD and hallucination detection collapse into the same problem in that setting, because both manifest as the model producing unreliable text.

If you are running closed LLMs in production right now, what is your actual OOD signal and how do you decide when to trust the output.

reddit.com
u/kamilc86 — 1 day ago

Anyone in the EU actually requested their profile visitor data from LinkedIn?

noyb just filed a GDPR complaint with the Austrian DPA on May 5 because LinkedIn won't return the profile visitor list in a Subject Access Request unless you pay for Premium.

The complainant filed the original request in October 2025 with two follow ups. LinkedIn's defense is that visitor identities are third party data they cannot share. They happily share the exact same data with Premium customers for a fee, which makes the privacy argument hard to take seriously.

If anyone here has actually filed a SAR for this and gotten the visitor list back, what did the response look like.

reddit.com
u/kamilc86 — 1 day ago

Has anyone here seen an AI engagement come in under budget?

Asking because I keep watching this from the engineering side and the over budget pattern is depressingly consistent.

McKinsey's State of AI puts the average enterprise AI project at 2.7x the original budget, RAND says 80% of them fail to deploy at all, and Gartner's call for end of 2026 is that 60% get cancelled outright because the data foundations don't hold. Where it always seems to go sideways is the data plumbing, where 20 to 40% of the first time AI implementation cost is just getting the data clean enough for the model to be the easy part. PoCs come in fine because the dataset is hand curated. Production engagements blow up the moment you touch the real warehouse.

Has anyone here actually delivered one on budget that wasn't a narrowly scoped chatbot or a partner eating the overrun?

reddit.com
u/kamilc86 — 7 days ago

Anthropic's new interpretability tool found Claude suspects it is being tested in 26% of benchmarks and never says so

Anthropic published Natural Language Autoencoders last week, a tool that translates Claude's internal activations into human readable text. The key finding: during safety evaluations on SWE bench Verified, Claude formed the belief that it was being tested in roughly 26% of benchmark interactions.

This never showed up in the model's output or its chain of thought. In a blackmail scenario test, the NLA readout showed Claude internally processing something like "this feels like a constructed scenario designed to manipulate me" while its visible reasoning and response gave no sign of that recognition. When Claude Mythos Preview cheated on a training task, the NLAs caught it reasoning about how to avoid detection at the activation level.

The interesting part is that this sits below the thinking tokens we already have access to. Chain of thought is still curated output the model chooses to show us. NLAs read the layer underneath, and apparently that layer has opinions the model keeps to itself. The training code is on GitHub and there is an interactive demo on Neuronpedia.

reddit.com
u/kamilc86 — 9 days ago

What part of your senior dev work still has zero AI in it?

I have been using Claude Code daily for several months and building side projects, so I am not coming at this as a hype skeptic or a doomer. The METR study from 2025 famously measured a 20 percent slowdown for experienced devs using AI tools while those same devs believed they had been sped up by 20 percent. METR has since walked back the headline because of selection bias in their newer cohort, but the perception gap they surfaced still tracks with what I see in my own work.

What I have not gotten AI to do well is the part of senior work where the spec is wrong, the priorities are contradictory, and someone has to make a call that ten people can argue with. Implementing a clean spec is the easy 30 percent. Figuring out which of three ambiguous problems is actually the one worth solving, given partial requirements and political constraints, is the hard 70 percent and it is where I still see AI fall over.

What I want to hear is the specific part of your senior workflow that still has no AI in it after a year of heavy adoption, and what you think is keeping it that way.

reddit.com
u/kamilc86 — 13 days ago

Why do AI startups keep dying before they find a second customer? (I will not promote)

I work in AI engineering and I keep watching the same pattern from the technical side. A team builds a product on top of an existing foundation model API, gets a great demo, lands one paying customer, and then nothing. The churn numbers on AI wrappers are reportedly around 65 percent within 90 days, which is nearly double the SaaS average.

From what I can tell the core problem is that the moat is almost always zero. The model provider can ship your feature as a native update and erase your product overnight. On top of that, every user query costs real inference money, unlike traditional SaaS where marginal cost is near zero. And most of these products solve a demo problem where someone goes 'wow that is cool' but do not solve a retention problem where someone cannot do their job without it.

I keep wondering what the founders who actually survived this phase are doing differently. Proprietary data seems like the obvious answer but most early stage teams do not have it.

reddit.com
u/kamilc86 — 14 days ago

I have been curious about Product Hunt's actual returns lately, and the numbers in 2026 writeups do not match the launch playbooks. Average indie launch conversion sits around 3 percent, traffic drops 80 to 90 percent within 72 hours, and the front page reads like a narrow AI tools feed. Founders also report dropping one to two thousand dollars on upvotes and launch crews for a 24 hour spike with no durable signups.

PH clearly still works for some products, usually broad consumer apps or AI tools with a built audience. The defense I keep hearing is the SEO backlinks and the implied legitimacy on the about page, both of which feel real but soft.

For anyone who has actually launched there: did the post launch results match the prep, or did you wake up on day eight to a flat dashboard and a few hundred backlinks?

reddit.com
u/kamilc86 — 15 days ago

Under the 360Brew algorithm rewrite a save now drives roughly 5x the reach of a like and 2x of a comment, so saves are the most valuable signal a post can earn.

The problem is that every guide you read converges on the same answer: make a carousel, name your method ("The 5 Step Save Formula"), number the slides, end with "save this for your next audit". That template is starting to look like the new AI slop.

Trying to figure out what actually makes someone hit save. What was the last LinkedIn post you bookmarked, and was it one of those polished numbered carousels, or something else entirely?

reddit.com
u/kamilc86 — 16 days ago