r/datascience

Does MSDS still make sense with my experience and pay?

I am set to begin Georgia Tech's OMSA this fall, after deferring this past spring when I started a new role. This is my background:

- Undergrad: economics at T20 school.
- Experience: 4 years. 3.5 years in hybrid DS/DE role (first job out of undergrad) at a non-profit, then six months into current role doing strictly DE at a healthcare org.
- TC: 144k ($125k base + 15% API) in MCOL city.
- Not open to relocation (I work remote but there's too much red tape to move out-of-state), so onsite/hybrid roles in NYC/LA for crazy TCs are out of reach.

At the time that I applied to OMSA, I was struggling to leave my old role while making $82k/year. That is not the case any more, so I am having second thoughts about OMSA. Anecdotally, I also see a lot of OMSA folks on LinkedIn (and the Slack group) struggling to break into data and/or simply remaining in their current roles. I presently work as a senior DE, but I am open to both DS and analyst roles in the future.

Can I still expect a (significant) ROI out of OMSA? I am targeting $160k - $175k TC in a couple years' time with no particular industry in mind.

reddit.com

u/teddythepooh99 — 20 hours ago

▲ 67 r/datascience+1 crossposts

How are people using AI/LLM in their work life?

I work for a US bank and I have observed that my job has shifted more towards creating Agentic workflow (fancy name of using LLM to automate tasks). In the last one year, I haven't touched any ML model. I am curious to know what is the experience of other folks.

reddit.com

u/adarsh_maurya — 1 day ago

▲ 12 r/datascience

What does career development at your company look like?

We talk a lot about entering but once you're in the role and have been for a while, I'm curious how your all's companies handles career development and what sorts of things you all do to develop in the role.

reddit.com

u/TaterTot0809 — 1 day ago

▲ 60 r/datascience

Actuarial Science vs Data Science?

Hi everyone, I'm an actuarial science student in Argentina. Here, SOA certifications aren't as important as having the degree itself, which is legally authorized to practice as an actuary. I'm about halfway through my degree, but I'm not sure if I'm really that interested in the insurance/finance side of things. I've noticed that I'm more passionate about math and statistics in other areas. My question is, has anyone transitioned from actuarial science to data science? What should I learn? Should I change majors and drop out halfway through, or is it better to finish this one and do a master's? At my university (UBA), there's a mathematics degree (with two specializations: pure and applied) and a data science degree (both are quite rigorous and focus on the fundamentals; data science is a mix of applied mathematics and computer science).

Thoughts?

reddit.com

u/Easy-Huckleberry7091 — 4 days ago

▲ 1.2k r/datascience+1 crossposts

Me pacing in front of my screen while my model is training

(Not sure if loss is still going down)

u/Jenna_AI — 5 days ago

▲ 20 r/datascience

Uplift Models Tutorials

Hello Everyone. I am moving to a new job and potentially I might need to implement uplift modelling to track customer revenue. Just wondering where can I learn the basics of it ? Gemini is giving a scikit learn package link. Is there any book or tutorials I can look into ?? TIA :)

reddit.com

u/NervousVictory1792 — 4 days ago

▲ 13 r/datascience

Unifying configs across coding agents (eg Claude code, Qwen, etc…)

Anyone have a good solution for unifying the config (eg CLAUDE.md, QWEN.md), settings, skills, etc… across their suite of coding agents?

I primarily use Claude Code locally, Genie Code in Databricks workspaces for my model development and MLE work with Databricks compute, and recently added Qwen Code since the company wants us to have a backup in case we hit Anthropic limits and need to continue work. Also on the docket is testing out GLM.

However unifying all these agents is quite cumbersome. I don’t want to maintain so many separate files and skills for each agent. Right now I have a single repo that backs up all my .claude folder settings but realized that with Qwen I’ll need a separate suite.

Thoughts? Has anyone tried the new thing Databricks pushed out called Omnigent?

reddit.com

u/Neat-Porpoise — 5 days ago

▲ 145 r/datascience

What is the most underrated skill every data scientist should develop?

Beyond Python, machine learning, and statistics, which skill has made the biggest difference in solving real-world data science problems and delivering business value?

reddit.com

u/Effective_Ocelot_445 — 7 days ago

▲ 10 r/datascience

Ran 4 open-source geo-experiment estimators on 8,000 synthetic panels with planted ground truth. Their point estimates look interchangeable, but their uncertainty isn't.

Our research team ran a simulation study and found that the four big open-source geo-experiment tools (CausalPy, Meta GeoLift, Google Matched Markets, and CausalImpact) recover almost the same point estimate on the same data, then disagree about whether that estimate is significant. Since the disagreement lives in the uncertainty (not in the point estimate) the tool you pick may determine which error you ship.

In a "live" experiment you can't grade the tool because we don't know what ground truth is. The counterfactual is unobservable so "is this lift real?" has no answer key. That's why we had our research team generate 8,000 synthetic daily-sales panels, each with either a 7.5% multiplicative lift on the treated geo or no effect at all (0% lift). They ran all four tools on the same panels and scored every fit against the planted truth, so there were 32,000 fits in all across four scenarios.

Across the non-outlier scenarios, every tool recovered the 7.5% lift within a few percentage points, so judged on point estimates alone they look interchangeable. The split is entirely in how they handle uncertainty: coverage (how often the 95% interval actually contains the true effect) and power (how often it detects a real effect at all). On those two axes the tools fall into three camps:

Meta GeoLift is the most cautious with coverage of 92–95% and a false positive rate of 3–5%. It failed to reject zero in 89–96% of runs where a true 7.5% lift was present.
CausalImpact is the opposite with the most power of the four (false negative rate 34–48%), but coverage of only 70–72%, a false positive rate of 28–30%, and a consistent upward bias of +1.87 to +4.21 percentage points that shifts the whole interval high.
CausalPy and Google Matched Markets sit between them with coverage of 76–86%, false positive rates of 14–25%, meaning they’re both under-covered and under-powered at the same time.

There are four things from the study I'd take back to a measurement program:

Read coverage and power together: A tool can keep its 95% coverage promise and still be useless for detection. GeoLift holds about 95% coverage in the short-history scenario while missing the real effect 95.7% of the time.
Pick the estimator whose error profile matches the cost asymmetry of your decision and not the one with the best-looking single metric.
Scarce history sharpens each tool's failure mode. Cutting the pre-period from 90 days to 30 didn't degrade the tools uniformly. The decisive ones threw more false positives (above 24%), the cautious one climbed to a 95.7% miss rate.
Test-market design beats estimator choice. When the treated geo was 5x the size of the median control, every tool's intervals widened 4–5x and most overestimated the lift by 2–4 percentage points. No estimator compensates for a structurally hard design.

We made everything reproducible including the data-generating process, seeds, configs, per-iteration results, and a Makefile that runs the whole pipeline. The generator is parameterized, so if you think it should be harder (idiosyncratic geo trends, heavier tails, spillovers between markets) those are exactly the runs I'd like to see.

If you’re interested in the full study + code, you can find both here:

Code: https://github.com/getrecast/geolift-simulation-study
Full report: https://research.getrecast.com/geolift-sim-study

edited: fixed the code link to the public repo

reddit.com

u/michael-recast — 6 days ago

▲ 16 r/datascience

Benchmarking whether open models are agentic enough on your own tooling

huggingface.co

u/rhiever — 4 days ago

▲ 33 r/datascience

Using local coding agents with open-weight models as an alternative to Claude Code and Codex

magazine.sebastianraschka.com

u/rhiever — 8 days ago

▲ 31 r/datascience+1 crossposts

Performative AI solutions tied to job/org success metrics

I’m struggling with the push in my org to produce ANY AI solution for things. The unsaid part is make it even if it has low utility or even if your job naturally does not have many opportunities to make an AI solution.

The point is to show you are using it and making “widely” usable solutions where the AI is the main feature (not just when AI helps you code an automation).

My boss is the type where if a leader says to jump, he jumps, he won’t even ask how high. In my last company, we were encouraged to say ‘ok I hear you but what do you expect the jump to accomplish and perhaps I can help from there?’ Our brainpower and time was treated like it was expensive and you had to consider the true utility of the thing you were being asked for.

I just feel like I’m hearing the most brain dead directions of my career. And I have to follow them or else I am going to be called out or disciplined in some way (unclear rn). We avoided layoff so far this year but I won’t be surprised if this becomes the main deciding factor for team or individual layoff.

Is anyone else in a similar boat? What are you doing about it?

reddit.com

u/customheart — 9 days ago

▲ 8 r/datascience

Weekly Entering & Transitioning - Thread 29 Jun, 2026 - 06 Jul, 2026

Welcome to this week's entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include:

Learning resources (e.g. books, tutorials, videos)
Traditional education (e.g. schools, degrees, electives)
Alternative education (e.g. online courses, bootcamps)
Job search questions (e.g. resumes, applying, career prospects)
Elementary questions (e.g. where to start, what next)

While you wait for answers from the community, check out the FAQ and Resources pages on our wiki. You can also search for answers in past weekly threads.

reddit.com

u/AutoModerator — 7 days ago

▲ 29 r/datascience

Dev Log on Steam Recommender (part 2)

Since the steam sale is live I wanted to post a Dev log on my personal project
https://nextsteamgame.com/ sharing some outcomes from the web traffic and how I changed the project from the great feedback I got!

I made a post about a month ago explaining how I made this opensource explainable search engine built around steam reviews to people find new video games, Not through Relevancy but through aspect based similarity.

Check out the old post for a better explanation if you want!
https://www.reddit.com/r/datascience/comments/1t7manb/steam_recommender_using_similarity_pt_2_student/

I wanted to say thank you to all the people of r/datascience and r/MachineLearning that gave me feedback and tried out my tool!

I improved the UI/UX of the website to make the vectors more clear and controllable, I Implemented a thumbs up and down feature on recommendations to see if users even like the tool.

I also wanted to share the after effects of promoting this tool on reddit!

from the 2,652 searches I got in the website 913 of them resulted in steam clicks! the games that were discovered were all in a uniform distribution and did not share much of a pattern showing me that the engine did its job in helping people find niche games across all genres!

(More images attached to post to see data viz)

I wanted to disclose that I made this tool to not make any profit of some kind, but it does use posthog so I can collect diagnostics now.

u/Expensive-Ad8916 — 10 days ago

▲ 8 r/datascience

Weekly Entering & Transitioning - Thread 22 Jun, 2026 - 29 Jun, 2026

Welcome to this week's entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include:

Learning resources (e.g. books, tutorials, videos)
Traditional education (e.g. schools, degrees, electives)
Alternative education (e.g. online courses, bootcamps)
Job search questions (e.g. resumes, applying, career prospects)
Elementary questions (e.g. where to start, what next)

While you wait for answers from the community, check out the FAQ and Resources pages on our wiki. You can also search for answers in past weekly threads.

reddit.com

u/AutoModerator — 14 days ago

▲ 2 r/datascience+2 crossposts

I built a full ML pipeline on a Kaggle dataset and proved it has zero predictive signal — and shipped the null result instead of faking accuracy

A failure mode I see constantly — in portfolios and in vendor models at work —
is reporting a great ROC-AUC without ever asking whether the dataset contains
any signal at all. So I built the opposite: a pipeline designed to falsify its
own results before trusting them.

I took a public BMW sales dataset (50k rows, 2010–2024) and ran the full stack:
econometrics, gradient boosting (XGB/LGBM/CatBoost), a tabular MLP, SHAP. Every
model landed at no-skill — regression R² ≈ 0, classification AUC ≈ 0.51.

Instead of torturing the data, I ran two checks I now apply by default:

- Permutation / label-shuffle test: refit on shuffled labels. If your "real"
score sits inside the shuffled distribution (here p ≈ 0.90), you have nothing.
- Positive control: push a synthetic target with known structure through the
exact same pipeline. It hit R² ≈ 0.86 — proving the pipeline is sound and the
data is the problem, not the code.

I also found the classification target was a deterministic threshold on the
volume column — textbook target leakage that gives a fake 1.00 AUC. Remove it
and AUC collapses to chance.

Since the data can't forecast, the actual deliverable is an explicit what-if
simulator (constant-elasticity demand, literature-grounded priors, Monte-Carlo
intervals) — clearly labelled as a model of assumptions, never a fit to history.

The whole thing is reproducible (Docker, CI, tests) with a live demo so you can
click through the leakage proof yourself. Genuinely curious where this breaks:
what would you put on a "does this dataset have any signal?" checklist?

[live demo] · [repo]

https://maxime2476-bmw-sales-analytics.hf.space/

https://github.com/maxime2476/bmw-sales-analytics

reddit.com

u/GoalMaxROI — 13 days ago

Does MSDS still make sense with my experience and pay?

How are people using AI/LLM in their work life?

What does career development at your company look like?

Actuarial Science vs Data Science?

Me pacing in front of my screen while my model is training

Uplift Models Tutorials

Unifying configs across coding agents (eg Claude code, Qwen, etc…)

What is the most underrated skill every data scientist should develop?

Ran 4 open-source geo-experiment estimators on 8,000 synthetic panels with planted ground truth. Their point estimates look interchangeable, but their uncertainty isn't.

Benchmarking whether open models are agentic enough on your own tooling

Using local coding agents with open-weight models as an alternative to Claude Code and Codex

Performative AI solutions tied to job/org success metrics

Weekly Entering &amp; Transitioning - Thread 29 Jun, 2026 - 06 Jul, 2026

Dev Log on Steam Recommender (part 2)

Weekly Entering &amp; Transitioning - Thread 22 Jun, 2026 - 29 Jun, 2026

I built a full ML pipeline on a Kaggle dataset and proved it has zero predictive signal — and shipped the null result instead of faking accuracy

Weekly Entering & Transitioning - Thread 29 Jun, 2026 - 06 Jul, 2026

Weekly Entering & Transitioning - Thread 22 Jun, 2026 - 29 Jun, 2026