u/Dry_Roof_1382

What is the state of DS nowadays?

I'm pursuing a 4-year BS in DS; currently I'm working in my college labs' DS/ML team. Like I feel DS jobs in laboratories like this is different from what it is in the industries, worried about getting a job after graduation.

How do DS/DA/DE jobs out there look like? Do they really pay attention to the details of how we handle the problems like what I see in my labs, or they care more about production ready outputs more?

reddit.com
u/Dry_Roof_1382 — 5 days ago

Insufficient data but suspiciously good metrics?

Well my research center's conducting a project on developing batteries. They task me with using ML to regress battery capacities onto a set of variables. I experimented with my custom models but then they told me to first try to replicate methodologies in a research paper.

The thing is that the article itself reports using only 90 samples collected from different labs, and 22 of them contain missing values (?) This is a heavy data shortage but somehow the authors report a R^(2) = 0.83 and pretty nice RMSEs / MAEs with gradient boosting models.

What do you think about this? I personally feel that the authors cherrypicked a seed with good metrics to report. Or is it possible that GBMs are so powerful that they can work with only a few tens of samples?

reddit.com
u/Dry_Roof_1382 — 12 days ago