u/Acinac

Hi everyone not sure if this is the right place but I just need to vent and get some outside perspective.

I work at a large conglomerate that spans multiple domains. I'm a data engineer and defacto team lead of a small team of one data analyst, one software engineer, and me. We usually handle POC projects, performance analysis, and process improvement for a consumer-facing product division and the company's manufacturing operations.

Following an org restructure earlier this year, our team was reassigned to support the R&D department of a specialized industrial materials division. At the same time, a company-wide mandate came down requiring each sector to generate a defined amount of AI-driven revenue per year through cost savings, new products, or time savings from AI usage. This landed on our team as "find ways to use AI to help researchers do R&D faster and more efficiently."

I started with doing some preliminary interviews regarding the current R&D workflow. Each researcher or small team owns a single research domain. They design an experiment, create a work order in Excel (containing a work ID, associated sample IDs, and tests needed per sample), then send the work order to multiple labs for testing. The problem is there is almost no data or knowledge management system in place.

The work IDs and sample IDs are created by each researcher with no naming standard. Sample IDs often contain duplicates across experiments. Two of the labs generate their own internal IDs when they receive the work order, fill out their test forms, and send results back. A third lab requires the researcher to manually create test tasks in a web application with no linkage back to the original work order. There is no standardization of data schema, naming conventions, or terminology across any of it. Most records are Excel files, but some exist only as emails or chat thread replies. If you want to trace an experiment from the original work (named '22032026_work_paper_exp1', yeah the named is the work_id for this researcher....) to lab 1 results (named '26M0321') to lab 2 results (named '26C0926') to lab 3 results (named '26AS0265436'), you need to open each files, extract the sample ID and matches them together and it is even possible that one sample does not includes test from all 3 lab. In that case you need to use the date to match them with the closest date and sample ID as sample ID can be the same across different experiment (thus different work paper).

It is an abosolute mess.

To make things worse, about two months before my team got involved the department had already engaged an external AI company to build prediction and optimization models for their core research workflows. The AI company's first ask was "send us the past year of research data so we can start training the models". That's when everything unravelled. The department couldn't produce a single clean dataset. They scrambled to manually piece something together and ended up with 48 rows of experiment data for one research domain and 147 rows for another and our company has been in this domain for a really really long time. For anyone who doesn't know, you typically need thousands of clean, structured records minimum to train a model that's worth anything (at least try to get them hundreds of data points damnit). What they handed over was essentially unusable. The external engagement is now stalled.

That context explains a lot about what happened next. After my preliminary investigation I met with the VP of the R&D department, presented the findings, and proposed a ground-up digital transformation (minimum 3 to 4 months). He stopped me at "3 to 4 months," told me to just find AI tools to ingest the legacy data and build a database from it, and said we could "talk about transformation later." He wanted something done within a month. Then he asked: "Have you ever heard of Claude Cowork? Just use Cowork, it should be really easy." I walked out completely drained.

My direct manager told me to try to accommodate the VP's request. We've just come under his department and the political reality is that the AI mandate created pressure to show something quickly even though this R&D function has been a core domain of the company for a long time with no data infrastructure to show for it. The external AI engagement presumably isn't cheap either, and right now it's going nowhere.

So here I am two weeks later, sifting through a complete mess of reports, Excel files, and PDFs. I can probably build file parser heuristics for one researcher's output, maybe a team's but to do it for every researchers, knowing it's just a band-aid that solves nothing structurally, feels like an enormous waste of everyone's time including mine. And even if I somehow pull it off, the data coming out the other end still won't be clean or consistent enough to unblock the external AI company.

Has anyone been in a similar situation? How did you handle the gap between what leadership wants to hear and what actually needs to happen?

PS. Sorry for the long post....I really need to vent a bit.

PS2. I really did tried to persuade them to pursue ground-up transformation first and why it is not a sustainable solution and a waste of everyone resources to try to piece the legacy data together (you can imagine how inefficient this is if the researchers themselve can only scrapped together ~200 rows of experiment data over 2 months.)

Hi everyone not sure if this is the right place but I just need to vent and get some outside perspective.

It is an abosolute mess.

Has anyone been in a similar situation? How did you handle the gap between what leadership wants to hear and what actually needs to happen?

PS. Sorry for the long post....I really need to vent a bit.

VP told me to 'just use Cowork' to fix years of data chaos in a month. I am losing my mind.

VP told me to 'just use Cowork' to fix years of data chaos in a month. I am losing my mind.