u/Best_District593 — reddlx

My friend skipped one step in vendor due diligence, and it cost them $67K and three months. Here's what it was

Six months ago, I met one of my friends who was the CTO of an early-stage SaaS company. They had a product idea, roughly $150K in seed runway, but no mobile dev capability in-house. They hired an offshore agency in three weeks.

They were idiots.

Not because they went offshore. That wasn't the problem. The problem was that they spent more time comparing hourly rates than verifying a single reference.

By month four, they had burned $67K. The app worked in demos, but outside demos, it crashed consistently on iOS 13, had no automated tests worth mentioning, and when they asked how maintainable the codebase was, the agency's answer was: "It's complex, you'll need us for ongoing support."

That last sentence was the tell.

They brought in an external dev to audit the code. He spent two hours and came back with a list I still have saved. No CI/CD pipeline. Test coverage is sitting at roughly 11% on core business logic. Three screens had no error handling at all. The app was technically functional, the way a car with no brakes is technically a car.

Getting out cost them six more weeks and $8K in legal fees. They rebuilt with a different firm.

Here's the step they skipped: reference verification done with real questions.

They did call references and asked the questions they were supposed to ask: "How was the quality?" "Would you work with them again?" Everyone gave them the polished version.

What they never asked: What went wrong during the engagement, and how did the vendor respond?

That question breaks the script. A vendor with a good track record will have a real answer, something that went sideways, what they did about it, and what the outcome was. A vendor who bullshits will give you a non-answer. And now you know something important before signing anything.

The second thing they skipped was a paid discovery sprint. Most offshore agencies will do a 1–2 week paid sprint for $3,000–$5,000 before you commit to the full build. You get an architecture document, a database schema, and a technical spec.

More importantly, you get a real working sample. How do they communicate when blocked? Do they ask smart questions, or just start building? Do they escalate problems or disappear until the next check-in?

They thought they were efficient. But we were not.

There's one thing I still don't know: whether they would have caught the test coverage problem earlier, even with better due diligence. I think yes, but honestly, some vendors pass every check and still cut corners under schedule pressure.

Has anyone here actually walked away from a vendor after doing a paid discovery sprint? Curious whether it ever revealed enough red flags to justify not proceeding.

reddit.com

u/Best_District593 — 7 days ago

▲ 1 r/AIAppInnovation

So I've been sitting in on a lot of ERP AI chatbot scoping conversations lately, some for clients, some for people just starting to evaluate, and there's this one thing I keep seeing that genuinely makes me uncomfortable every time.

A team gets a demo. The chatbot looks incredible in the demo; it answers cross-system questions, pulls live data, triggers approvals, and handles follow-ups in context. Everyone in the room is excited. They sign.

And then somewhere between month four and month eight, someone on the operations team quietly mentions that employees are still submitting tickets for the same queries the chatbot was supposed to handle. The chatbot in the demo ran against a 200-row test dataset; the production ERP had 11 years of transaction history and three custom modules the vendors hadn't seen. And the IT person in the room goes quiet because they already knew.

The demo chatbot retrieved data from a clean, prepped environment.

The production chatbot was connected to the actual ERP; the one with seven years of custom modules, a Salesforce instance, a data warehouse nobody documented properly, and a legacy approval workflow that only three people in the company fully understand.

Those are not the same build. The proposal treated them like they were.

What I've started realising is that there are basically two types of ERP AI chatbots, and vendors don't volunteer which one they're actually scoping. One reads your ERP data. One acts on it; triggers workflows, executes approvals, and escalates vendor SLA breaches without someone manually catching it. The first one saves an employee a few minutes per query. The second one removes entire process steps. The price difference in the proposal is not proportional to the capability difference in production.

And the gap almost never shows up at launch. It shows up at the six-month adoption review when daily active usage is 20% of what was projected, and nobody can clearly explain why.

From what I've seen, the questions that actually expose this before you sign anything:

Did a person with real ERP engineering experience review this scope, or just an AI product team?
Does the proposal include post-launch model retraining, or does it stop at go-live?
What happens when an employee's query falls outside the training data?

A good team will walk you through the failure mode. A team that hasn't actually built inside your ERP will give you a very confident non-answer.

Anyway's. Curious if anyone else has been through this. What was the gap between what got demoed and what got deployed? Was it the cross-system queries? The compliance architecture that got added as an afterthought? The retraining that was supposed to be quarterly and never happened!!

What actually happened vs. what the proposal said.

reddit.com

u/Best_District593 — 16 days ago

▲ 2 r/AIAppInnovation

Week ten of a twelve-week build. We're doing a query audit, the kind we should have done in week two, and we realise that roughly 40% of what the client's employees actually need to ask involves data that SAP Joule literally cannot see.

Their Salesforce instance. A data warehouse they'd been running since 2017. Both are completely outside Joule's reach.

We'd proposed Joule because the client was on S/4HANA Cloud, a clean single-vendor stack on paper, and Joule deploys fast for standard in-SAP queries. What we hadn't mapped properly was where their real query volume actually came from. Finance querying ERP data? Fine. Operations wanting to cross-reference CRM history with inventory status? Invisible.

So we went back and rebuilt the connector layer around an RAG approach over their OData layer. Three weeks added to the timeline. Client was decent about it; we'd caught it before deployment, so it wasn't a production disaster, just a painful rework conversation.

The thing that bothers me more than the three weeks: we had enough information to catch this in week one if we'd run the query audit then. The client gave us their top 50 employee requests in the kickoff doc. I looked at it again after week ten, and the cross-system stuff was right there. I just didn't weigh it properly when scoping the integration approach.

The deployed version works. Adoption is actually decent. We pushed it through Teams, which helped a lot; people didn't have to change where they worked. The finance use case took a while to click, but it did.

The retraining cadence I'm less confident about. We went monthly, failure-case-driven. Client's finance team uses enough internal terminology that I think bi-weekly for the first four months would've gotten intent recognition sharper, faster. Hard to know. We didn't run the counterfactual.

Curious if anyone else has hit the query-type coverage problem on multi-platform environments, specifically where the client thinks they're running a clean single-ERP stack but their real workflows are pulling from three systems.

How early are you mapping that before it affects the architecture call?

reddit.com

u/Best_District593 — 17 days ago

▲ 2 r/AIAppInnovation

Their Salesforce instance. A data warehouse they'd been running since 2017. Both are completely outside Joule's reach.

The deployed version works. Adoption is actually decent. We pushed it through Teams, which helped a lot, people didn't have to change where they worked. The finance use case took a while to click, but it did.

How early are you mapping that before it affects the architecture call?

reddit.com

u/Best_District593 — 17 days ago

▲ 1 r/AIAppInnovation

We were twelve weeks into a SAP S/4HANA chatbot build when we realized we had scoped the wrong architecture.

The client wanted cross-system queries — ERP data combined with their Salesforce instance and a data warehouse they'd been running for seven years. The proposal we started with used SAP Joule as the base layer.

Joule is a good product. It's built natively into the SAP environment, deploys fast, and handles standard in-SAP queries without custom connectors. What it doesn't do well is reach outside the SAP ecosystem. The CRM and warehouse data were invisible to it. The multi-system queries that accounted for about 40% of what the client's employees actually needed to ask were all undetectable.

We caught it before deployment, but only because we did a real query audit in week ten instead of week one, which was our mistake. We went back, redesigned the connector layer around a RAG approach over an OData layer, and added three weeks to the timeline.

The deployed version handles cross-system queries now. Adoption is actually fine. But I think about how that conversation with the client would have gone if we'd launched the Joule version and they'd discovered the gap in production.

The thing I keep seeing in discussions about enterprise AI chatbots is that the native Copilot vs. custom RAG decision gets treated as a style preference when it's actually a functional one. They're not interchangeable options for the same use case.

Native copilots (Joule, Oracle Digital Assistant, Copilot Studio) are the right choice if you're running a clean single-platform ERP stack, your use cases stay within that platform, and you want fast time-to-value on standard query types.

Custom RAG layers are the right choice if you have cross-system data requirements, you need workflow triggering rather than just data retrieval, or your ERP environment includes a legacy platform with non-standard data schemas.

Most mid-to-large enterprise environments are the second scenario. A lot of vendors still propose the first solution because it's easier to scope and faster to demonstrate.

I'm still not sure we got the retraining cadence right on this one. We built out a monthly model retraining loop based on failure case analysis, but the client's finance team uses enough domain-specific terminology that I wonder if bi-weekly would have produced better intent recognition in months two and three. No clean answer on that yet.

Anyone else navigating this in SAP or Oracle environments specifically?

Curious how others are handling the query-type coverage problem when the client's data environment spans more than one platform.

reddit.com

u/Best_District593 — 17 days ago