u/Andreas_Kozachenko

Is your AI search problem really a search problem?

A lot of the time, no. And what I keep seeing in my working landscape (I work with enterprise ecommerce), supports this idea.

Say I want to put an AI search on a site. Because I’ve been around this space for a while, I already know what usually sits underneath that idea - messy data. But the initial process usually looks like this: the catalog looks organized enough, the business is used to it, the old search more or less works, so the next thought is just “fine, let’s put AI search on top.”

And that’s usually where I’d get careful.

Because the mess inside the catalog may still be perfectly survivable for normal operations. People know the weak spots, teams work around them, standard search can often live with more inconsistency than anyone wants to admit. But for AI search it’s a worse fit. It needs the product meaning to hold together more consistently than these catalogs often do.

So the first thing I’d do is not start with the AI search layer itself. I’d start with the catalog underneath and make it more interpretable first.

You can do that manually, obviously. Good luck with that at scale. The better news is that there are already solutions trying to handle that preparation step too, including with AI.

And only after that, once the underlying data is in better shape, would I trust AI search to sit on top of it.

What I’m more curious about is where people actually draw the line here. At what point does it stop being “search tuning” and start being a data-preparation problem in your system?

reddit.com
u/Andreas_Kozachenko — 1 day ago

How to move toward AI when your data model was never designed for it

What usually happens is simple: data that people could still work with manually stops working once AI has to read it consistently.

As I work with enterprise ecommerce, I see this pattern there clearly. Teams can live with a messy data model for a long time because people already know how to work around it. The catalog moves, search returns something, and the business keeps going.

AI changes the standard. Now the same data has to be interpreted more consistently, and that’s where old gaps start showing up.

That’s why I see the practical in not rebuilding everything and not throwing AI straight onto the raw data either. What tends to make more sense is adding a readiness layer between the existing data model and the AI use case - something that helps normalize, interpret, and prepare the data before AI starts relying on it.

That way you are not pretending the source systems are clean, but you are also not asking AI to guess its way through the mess. From my side, that’s usually a much more realistic path forward, especially in enterprise systems with lots of data and data sources.

Have you seen this too, where AI exposed data problems that day-to-day operations had been tolerating for years?

reddit.com
u/Andreas_Kozachenko — 7 days ago

A common problem with AI-generated product content is that it often makes weak product data look solved when it isn’t.

I keep seeing this around product cards because it’s one of those use cases in ecommerce that calms people down very fast. You need titles, bullets, descriptions anyway, the model gives you something clean in minutes, and suddenly it feels like progress. But a cleaner card is not the same thing as a more trustworthy one.

Very often I see that if you look a bit lower, you can find the same old problems: attributes don’t fully match, some values are vague, some are missing, and mappings are off. The AI didn’t fix that but gave the mess better phrasing.

That’s why I keep thinking AI-generated product content for catalog gets framed at the wrong layer. From my side, it makes more sense to start earlier. Get the attributes into a shape you can actually trust, validate what matters, stop uncertain cases from flowing straight through, and only then generate from approved data. Less impressive in a demo, obviously. But demos are very tolerant. Production catalogs usually aren’t.

Do you let AI generate from whatever is already in the catalog, or do you force cleanup and validation first?

reddit.com
u/Andreas_Kozachenko — 11 days ago