r/datascience

After 5 years in data science, I’m starting to realize most “insights” we deliver are completely ignored. Is this normal?

I’ve been in data science roles (both analytics and ML) for about 5 years now across a couple of companies. Lately I’ve been feeling a bit burned out because I keep seeing the same pattern:

We spend weeks cleaning data, building dashboards, running statistical analysis, or training models… and then the stakeholders either:

  • Say “thanks” and never use it
  • Cherry-pick the numbers that support their existing opinion
  • Or just completely ignore the findings and go with gut feel anyway

The worst part is when leadership asks for a “data-driven decision” but they’ve already decided what they want to do.

Am I alone in this? Or is this just the reality of data science in most companies?

For those of you who’ve been in the field longer how do you deal with this? Have you found companies where data actually influences decisions at a meaningful level?

Would love to hear honest experiences.

reddit.com
u/ExternalComment1738 — 15 hours ago

Do the Meta/Intuit layoffs actually make the job market harder for those of us already searching?

I get it, the obvious counterargument is that all the laid off DS folks flood the market too, making it more competitive. But I honestly have no idea how many data scientists were actually cut in these recent rounds, so I’m struggling to gauge whether this realistically tanks my job search or if it’s more noise than signal.

More importantly though, what’s the actual move here? What are people doing to stay competitive?

reddit.com
u/Lamp_Shade_Head — 1 day ago

I compared XGBoost, LightGBM, CatBoost, random forest, LASSO, and a small neural network in a momentum stock trading strategy

Last week I posted about an XGBoost based momentum stock trading strategy, and I got two separate comments:

“Why not LightGBM?”
“Why not CatBoost?”

So I did a controlled swap of 6 models inside my existing momentum pipeline and reran the same backtest with:

  • XGBoost
  • LightGBM
  • CatBoost
  • Random Forest
  • LASSO
  • A simple 2‑layer neural net (sklearn’s MLPRegressor)

Setup / constraints

  • Same universe, features, filters, and portfolio construction
  • Only the model changes; all other code is identical
  • Default hyperparameters for each model (on purpose) to see how they behave “out of the box”
  • Logged everything to MLflow so I could compare runs, metrics, and charts cleanly

I’m not claiming this is a definitive “which model is best” answer, just one controlled experiment on one dataset/strategy. But a few patterns showed up that I thought were interesting.

High‑level takeaways:

  • XGBoost and LightGBM were basically neck‑and‑neck on headline returns, but XGBoost had a better risk profile. CatBoost underperformed in a way that I wasn’t expecting.
  • The NN had the highest CAGR, Sortino, and total return. This was another surprise to me. But XGBoost and LightGBM had better drawdowns.
  • LASSO and random forest did not beat the S&P in the cumulative returns over the time period, all the other algos beat the S&P.

The goal here was to largely show that it's easy to switch out algorithms and how different algorithm families perform. Disclaimer: the full article does contain links, but this was truly an analysis that took a long time that I wanted to share with the community. Full article with more results: https://www.datamovesme.com/blog/what-happens-when-you-swap-out-xgboost-a-6model-momentum-showdown

u/Clicketrie — 1 day ago

Agentic Workflows beyond "pull the data"

i've been using the robots to do a lot of my data retrieval and general project planning. i haven't actually used an agent to train/eval a model though. i would like to hear your use cases, if you have.

how did you frame the work to the agent? how did you give the agent feedback to decide if it was "done"? how did you decide if the model/output was "good"? did you let the agent decide?

maybe i am over thinking it. maybe i just say "train a model on this data to predict XYZ. try as many models as you like and report back the best performing model." then i can just sit there and watch it cook.

share your stories please.

reddit.com
u/astroFizzics — 23 hours ago

How does your team handle the security issues of coding agents on real data?

Been thinking about this a lot lately. We use coding agents daily on real datasets.

Two things I read recently that made me uncomfortable:

  • Prompt injection : basically the agent read some website to files on Internet, then some hidden instructions it'll just execute and can exfiltrate data to external server?
  • Slopsquatting: LLMs hallucinate package names that don't exist. Attackers pre-register the most-hallucinated names on PyPI with malware.

This is a few I can think of but it makes me wonder how other teams manage it? Do you believe those are real risks or some security researchers fantasy?

reddit.com
u/SummerElectrical3642 — 2 days ago

Are there any small, quick things I can do everyday to keep my skills sharp?

I’m sure everyone knows about the dilemma of AI at this point. We want to work faster but our skills are atrophying yada yada…as a junior data scientist, I feel like I barely had any skills to begin with. Now with my company forcing us to use AI, I feel like I’m not learning much. Now I’ve been doing leetcode, but I just don’t think it’s that applicable to my real job. I don’t have the bandwidth outside of work to do a project yet, since my company is working us to the bone. What are some quick habits/tools/websites/apps you recommend to keep your skills sharp?

Edit: so many great tips in the comment section, thank you all!!! I will save this post for future reference

reddit.com
u/ExcitingCommission5 — 3 days ago

The most insane interviews/take-homes I've ever gotten

Is this the case with everyone or just me?

Interviews have gotten so much more difficult than they were about 1-2 years ago. The take homes are also very intense.

I just got a take home that would be at least 10+ hours of work to do (build a full langauge model classification pipeline, then put it in an API). I've never seen anything like this, or had any friends before get these either.

Is the interviewee expect to use claude code/codex or have standards just risen that every DS is now cracked? It's like they gave a whole team's sprint or more as a take home.

I think claude can solve this in like 45 minutes but still I would be sweating here for hours trying to crank this out.

reddit.com
u/LeaguePrototype — 3 days ago

Question for those in DS with an epidemiology, biostatistics or health informatics background

I work in data science in a biotech/pharma company with an epidemiology/biostatistics background - in my previous jobs, I worked with colleagues who had a similar background but had much stronger research skills rather than programming skills in R or Python. This is where I felt I really shined because I loved using both to develop solutions that automated critical processes, data visualization tools and all. My technical skills I felt were my strongest asset in my career.

Both me and my research colleagues eventually switched into biotech - however, I work specifically in a data science team while they work in other roles. In the past 2 years, I've been really confused with my trajectory, especially the feeling that I focused a lot on technical skills that there is a push for AI to automate. Although I have a more balanced approach to AI in that I feel that even if AI can produce technical solutions, it still needs a lot of description and steering to get it to work the way it should - I still have this "what am I doing" feeling. I don't really have in-depth knowledge of the therapeutics I work with even though I try to set time to learn the domain knowledge and network with colleagues who have been working on the projects I've just gotten started on for years. My job over the last few years has felt really confusing as my team struggles with technical debt, lack of ownership and the myriad of other things. Moreover, I don't really see myself getting promoted - I started here with a senior DS role after having nearly a decade of experience and while I try to network extensively with my colleagues and take initiative, I feel like I might be stuck at this level for a while.

I look at my colleagues who were in research roles in previous jobs and they quickly got promoted to director roles in pharma in a span of just a few years. It's making me wonder if becoming a DS with a healthcare background was really worth it - data science in biotech/pharma feels very behind both in terms of organizational maturity and salary compared to tech and even other areas of biotech - but I do find the domain knowledge projects I work on more meaningful to me than the possibility of working at Meta or Amazon, say. It has me wondering if I should (or even can) switch to something else in pharma- but the thing is, I don't even know what to look for or what the titles/skills even actually mean or how my skills would be transferrable. I spoke to a colleague in medical affairs and when they explained the job, it felt like I would be jumping into a whole new world and bit of an unknown territory that I'm not sure I'd even like. I'm wondering if anybody else has been in this position and can offer advice - should I say in DS in biotech and grow my career here or leave data science for a role/function in pharma/biotech with an epidemiology/biostatistics background?

reddit.com
u/thro0away12 — 3 days ago

Ideas on a Forecasting Problem

​

Hi everyone,

I'm working on a retail/e-commerce forecasting project where we need to predict synthetic demand (actual sales + lost sales due to stockouts) during peak festival times.

We are trying to calculate the lost demand when an item goes Out of Stock (OOS), but the extreme volatility of the short festive window is making standard historical imputation impossible.

The Data We Have:

Periods: Last Year BAU, Last Year Festive, Current Year BAU.

Constraint: The BAU and Festive periods we are looking at are only 7 days long each.

Sales Data: Store + SKU level across all these periods.

OOS Records: Flagged at the Hour + Day + Store + SKU level.

Search Data: Search sessions at the day + hour + store level in which the specific SKU (or its parent L3 category) was present/impressed.

Features available: store, sku, day, hour, store\_cluster, category, subcategory, l3\_category, city.

The Core Problem:

Because the festive period is only 7 days, every single day and hour has a completely different demand profile. For example, the conversion rate for an item on "Festival Day minus 1 at 8 PM" is drastically different from "Festival Day at 8 PM" or even 2 PM on the same day. Because of this intra-day and day-to-day volatility, we can't just take a simple historical average of the previous day or week to impute demand when an item is OOS.

Our Current Idea:

Since we still capture search sessions when an item is OOS, we want to use search volume as our proxy for raw demand. To convert those searches into "lost units," we need to predict a highly contextual Search-to-Sale Conversion Rate (CVR).

When a Store-SKU is OOS at a specific day/hour, we want to find its "Nearest Neighbors" based on the categorical and temporal features mentioned above, and do a distance-weighted average of their In-Stock search-to-sale CVRs. We then multiply this imputed CVR by the actual search sessions observed during that OOS hour.

My Questions for the Experts:

What is the best metric to quantify the relationship/distance between these heavily categorical and temporal combinations? (e.g., Target encoding + Euclidean distance? Random Forest proximity matrix?)

How would you handle the cyclical/temporal features (day, hour) alongside the search session volume so the model understands the specific urgency of a festive timeline without suffering from massive data sparsity?

Is there a completely different architecture (like LightGBM directly predicting lost sales using search volume as a feature) you would recommend over this KNN/distance-based CVR imputation?

Would love to hear how you've tackled similar short-term, high-volatility lost sales problems.

reddit.com
u/Standard-Broccoli130 — 3 days ago

Not considering the benefits of your specific job (comp, PTO, remote, job environment, job security, etc), how much do you enjoy the actual work?

When considering your day to day activities, do you enjoy them? The thought processes, problems/solutions, ultimate goals, etc.

Is a lot of your work intellectually stimulating and satisfying to work on? Or only a portion of it? None of it?

Does it feel like "just another white collar job" or not?

As someone who only has an educational background in this field and not job experience in it, I would like to know your thoughts.

reddit.com
u/Augustevsky — 4 days ago

Weekly Entering & Transitioning - Thread 18 May, 2026 - 25 May, 2026

Welcome to this week's entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include:

  • Learning resources (e.g. books, tutorials, videos)
  • Traditional education (e.g. schools, degrees, electives)
  • Alternative education (e.g. online courses, bootcamps)
  • Job search questions (e.g. resumes, applying, career prospects)
  • Elementary questions (e.g. where to start, what next)

While you wait for answers from the community, check out the FAQ and Resources pages on our wiki. You can also search for answers in past weekly threads.

reddit.com
u/AutoModerator — 4 days ago

No feeling quite lower than...

crushing the system design interview just to bomb the pandas-live coding interview even though you've been using pandas everyday for 10 years.

If anyone wants feedback on how that feels like hmu.

Anyone know if they sell kegs of Jager? Asking for a friend...

reddit.com
u/MeLikaDoTheChaCha — 6 days ago

For those in corporate roles, how do you all work with the non-technical areas you support?

I've spent the past few years at what feels like a somewhat dysfunctional company. Our Data Science and Engineering teams are very siloed away from the rest of the company, including the teams we support and build things for. IC individuals rarely interact with those requesting the work, and myself and many of my peers have the common challenge of needing to talk to the people who asked for what we're building, but we're often told no we can't go talk to them. This is one of our biggest pain points, and it makes it very difficult to know if I'm making the most sensible choices given the goals of the work.

In the small amount of conversations I have been able to be in with our non-tech teams, it feels like there's this constant tension. Some of my team's 'vision' for the future feels more like changing another area's business strategy instead of using Data Science to support them with their actual stated strategy. Maybe these two things can work towards the same goals in the future, but from the small amount I've seen now, we're rowing in a different direction than the teams we're supposed to be helping, and I'm worried this will harm trust and the ability to influence in the future if there are places we want to suggest different ways of approaching a problem. I'm not in enough of the conversations I need to be in to have this context though.

Is it like this at other companies? I know the economy and job market are pretty rough right now, but as I'm thinking about longer term decisions, I want a company where there's a functional relationship between business and technology and those of us building can actually speak to the people we're building for. Building the best technical solution doesn't matter if it doesn't actually help the people it's for, or have a way to be incorporated into current processes. I'm just not sure how to assess this from the outside or how common this is.

reddit.com
u/SkipGram — 5 days ago

Applied Scientist Interview Prep

What is the applied scientist interview like at Amazon/Uber/any other place that has it?

Do you mostly prep leetcode or causal inf? Or what to expect?

I'm a bit lost for how difficult these interviews are and what is the most difficult part of them? Personally my stats/ML is pretty good but I struggle with leetcode mediums

reddit.com
u/LeaguePrototype — 6 days ago

I think I need to rethink my career roadmap

I had a meeting today that basically gave me an existential crisis. I spent most of the morning cleaning a mess of a dataset and building out what I thought was a pretty slick visualisation on consumer behaviour. I go into the meeting, present the findings, and instead of receiving questions about methodology as I expected, my manager asked me how to show him the actual strategy, which i never thought was part of my role in the first place. Actually, I would prefer no questions at all lol.

Anyway, I am doing the technical work behind the scenes and it seems that it’s kind of invisible for everyone else. In fact, I am getting more requests on giving my input on strategy and consumer psychology lately, so I started doing some research. It’s actually interesting how everything changes, but also quite overwhelming because I really do not like the storytelling part. Usually, I do my bit, present it, and I’m out lol.

What I wanted to share with you here is that while this situation is definitely not in my advantage, I started to do some digging and found some really interesting perspectives on this and what expectations organisations have now with the massive implementation of AI everywhere. I use AI daily and it makes my work sooooo much easier, but using AI is not enough anymore apparently. Here it is: https://www.qualtrics.com/articles/strategy-research/market-research-trends/  The main idea here is that technical skills are the baseline, not the real value added to the organisation...???

Does anyone else feel like the goalposts are moving? I’m genuinely wondering if I should stop grinding LeetCode and start reading business strategy books just to stay relevant. Would love to hear if your roles are actually changing or if I'm just overthinking one bad meeting.

u/prattman333 — 7 days ago

Publication Topics Question

Hi,

i am looking for topics to cover in a potential publication, as I will have a few months free time. The problem is, I am struggling to decide for a potential problem statement to focus on, to find a solution/get insights about it. I asked ai what kind of problems are covered in papers currently, but the response was not satisfying for me. Now I ask this in this com. Are you currently working on problems and know about additional problems to tackle?

My experience fields:

  • statistics/probability theory
  • machine/deep learning
  • natural language processing
reddit.com
u/InfamousTrouble7993 — 6 days ago
▲ 13 r/datascience+1 crossposts

Looking for advice: Online Master's in Applied Math for ML while working full-time

Hi everyone,

I'm looking for some honest input from people who've been down this road or know the landscape well.

My background:

  • B.Com in Finance & Accounting from Delhi University (2019)
  • During Covid somewhat made my way into machine learning by doing self study at home.
  • Currently a Senior ML Engineer at a large financial data/tech company in Bengaluru
  • Day-to-day work spans around NLP/LLM systems, real-time ML pipelines, distributed data infra, and AWS.

What I'm trying to do: I want to seriously deepen my foundations in applied mathematics for ML — think probability, linear algebra, optimization, statistical learning theory, the actual mathematical machinery behind modern ML rather than just the engineering side. I've been doing ML professionally for a few years now and I keep hitting the ceiling where deeper math intuition would make me significantly better at my job (and at research-leaning problems).

My constraints:

  • Can't leave my job. I need a fully online / part-time / WILP-style program.
  • Based in India, so an Indian program is ideal (IISc, IIT online degrees, CMI, ISI, BITS, etc, i know getting into top tiers college is very very hard for someone whose background isn't in engineering but still if there's any way they accept non-techincal degree holders, I would like to know more about how one can enrol for such programes)
  • Open to foreign universities too if the program is genuinely online and the time zones work out

What I'd love input on:

  1. Programs you'd actually recommend (and ones to avoid) for applied math / mathematical ML at the master's level, fully online
  2. If anyone has done IIT/IISc online degrees coming from non-technical background in math/stats/ML while working full-time, how was the experience and workload?

Not looking for career change advice happy in my role. Just trying to build deeper foundations the right way. Any pointers appreciated.

reddit.com
u/Lamba_ghoda — 9 days ago