r/DataScientist

PROJECT REVIEW

Hello Everyone!!, I just completed a BIG project I have been working for a month and i want your opinion about it.

It's a SpaceX Launch Predictor & Cost Optimizer (A full end-to-end ML system that predicts the probability of a SpaceX Falcon 9 booster landing successfully, enriches launch data with real weather conditions, and exposes the results through an interactive Streamlit web application with a business ROI calculator.)

It Includes Data Pipeline, Advanced Machine Learning Algorithms (with Hyperparameter tuning), Explainability AI (SHAP), MLOps (AWS S3, Docker) and Business Value (ROI Calculator = Financial Results).

FUN FACT: For this project i used my own Evaluation Metric library (standardizes supervised and unsupervised model diagnostics into a single, consistent API), that is also Verified and Published in PYPI Community.

Project Info: https://github.com/Alkiviadisss/SpaceX

github.com

u/Senior-Neck499 — 1 day ago

▲ 26 r/DataScientist+1 crossposts

Kindly help me why i am not getting selected 😭

kindly check this and let me know what to add or subtract from this and also company recommendation.... please

u/Murky_Link4725 — 4 days ago

▲ 5 r/DataScientist+3 crossposts

Knowledge distillation for time series forecasting

I was wondering if there is a proven technique that works for knowledge distillation in the context of time series forecasting.

I have been trying alignment in the latent space with the Frobenius norm of Gram matrices as alignment loss, but results are not that impressive so far.

Any recommendations? Thanks!

reddit.com

u/Pazigoo36 — 3 days ago

▲ 7 r/DataScientist+3 crossposts

Looking to hire a Senior Software Engineer/ Data Scientist / AI Engineer

I'm looking to hire a Senior Software Engineer/ Data Scientist / AI Engineer, you must be:

- able to speak in English fluently and professionally

- willing to work really

- experienced with backend development, AI/ML, or Data Science more than 5 years

It is not a role for Interns and Juniors

Please reach out to me with your linkedin profile.

WhatsApp: +1-713-913-2115

Thanks

reddit.com

u/Hefty_Tea_5515 — 3 days ago

▲ 347 r/DataScientist+4 crossposts

I finally understood why everyone says linear regression is the foundation of ML.

Today I learned something that I think I rushed through when I first started learning ML.

The equation y = wx + b looks almost too simple, so I never paid much attention to it.

What finally clicked for me is that this isn’t just the equation of a line.

With one feature, you’re fitting a line.

With two features, you’re fitting a plane.

With n features, you’re fitting a hyperplane in n-dimensional space.

The equation barely changes: y = w₁x₁ + w₂x₂ + … + wₙxₙ + b

Another thing I didn’t know until today:

“Linear” doesn’t necessarily mean the relationship between x and y is a straight line. It means the model is linear in its parameters (the weights). So you can use features like x² or log(x) and it’s still linear regression.

That also helped me understand why linear models are still widely used in production—they’re simple, interpretable, and every weight has a meaning.

Kind of funny that I spent more time trying to understand transformers than the equation almost every supervised ML model builds on.

For people who’ve been doing ML for a while: I’m working through ML from first principles. What topic should I dive into next?

u/teee0512 — 7 days ago

▲ 13 r/DataScientist+3 crossposts

Which AWS certification should I pursue: Data Engineer Associate or Solutions Architect Associate?

Hi everyone,

I'm a Data Analyst with 1.5 years of experience, and I'll be starting an MSc in Data Science in the UK this September.

I've used a few AWS services before, and one of my master's modules is Large-Scale Data Engineering, so I'm considering the AWS Certified Data Engineer – Associate. My long-term goal is to become a Data Scientist.

Which certification would you recommend: Data Engineer Associate or Solutions Architect Associate? Which would be more valuable for my career?

Also, what are the best resources for preparation, and roughly how long does it take? I'm a fairly fast learner.

Thanks!

reddit.com

u/Higher-Dimension1 — 5 days ago

▲ 43 r/DataScientist+13 crossposts

Machine Learning Concepts [D]

Dear Folks, I have created multiple content on Machine Learning(work in progress), and they are free. I am a data scientist and a post grad degree holder in AI/ML from IIT. To help the machine learning community with important Machine Learning Concepts, I have created multiple long form videos, and structured topicwise digestible contents structured as playlists for learning.

If you go through the first two playlists:

Introductory Machine Learning Concepts
Probability Foundations: Univariate Models

You might find helpful content, I have tried explaining with intuitions, derivations, and this is work in progress. For code implementations, scikit learn website has great content on them as well. In total they have 60+ topicwise videos so far, and I think they have the potential to help folks a lot in starting with concepts, or getting with mathematical concepts, or whether you are preparing for an AI/ML/Data job interviews etc.

When I sat for my interviews, I was grilled on my project, but majority of questions from my project tested more on foundational concepts and there know how’s.

These are FREE content on youtube. This is for the benefit of the learning community.

Link: https://youtube.com/@aayushsugandh4036?si=w5MKORU2fWzLRrAJ

u/Negative_War_65 — 8 days ago

▲ 5 r/DataScientist+1 crossposts

I built a semantic analytics platform on top of fragmented US drug market datasets — would love feedback from fellow data professionals

Hi everyone,

I'm a data architect working primarily in pharmaceutical analytics.

One challenge I've seen repeatedly is that pharmaceutical public data is technically available—but it's spread across dozens of different sources with different identifiers, formats, and update cycles.

As a side project, I started building TheRxPulse to solve that problem.

From a data perspective, the interesting part wasn't building dashboards.

It was:

Entity resolution across datasets
Identity mapping
Semantic modeling
Normalizing inconsistent product names
Linking manufacturers, applications, products, and regulatory events
Creating a unified analytical model

The front end is simply a way to explore that connected data.

You can see the current version here:

https://therxpulse.com

The design philosophy is to make complex pharmaceutical data easier to query and understand without requiring users to know where every dataset originates.

I'd love feedback from other data professionals on:

Semantic modeling approaches
Data quality strategies
Entity resolution techniques
Visualization ideas
Features you'd expect from an analytics platform like this

Always happy to discuss architecture decisions if anyone is interested.

reddit.com

u/No-Cover-4461 — 7 days ago

▲ 45 r/DataScientist

Multivariate Models of Probability in Machine Learning for Data Scientists

Hello Folks,

Have you ever wondered why we use sigmoid function so often in Machine Learning? Although it gives us a probability, it comes from Exponential families, and this exponential family, subsumes many of the distributions, that we study in Machine Learning.

In this lecture, we understand exponential families, Directional derivatives(Gradients and Hessians), study mixture Models, and understand how domain knowledge in Probabilistic Graphical Models makes our life simpler to model joint probability densities.

Timeline breakup(in hours and minutes):
0:00-0:17 - Understanding exponential families.
0:17-0:27 - Deriving Sigmoid Function for Bernoulli.
0:27-0:48 - Understanding log partition function, convex functions and proving why positive definite of hessians imply convexity, and why convex needed?
0:48-1:04 - Directional derivates(deriving gradients and hessians)
1:04-1:26 - Maximum entropy derivation of the exponential family.
1:26-1:56 - Mixture Models(Gaussians and Bernoulli Mixture Models)
1:56-2:16 - Probabilistic Graphical Models
2:16-2:34 - Markov Chains
2:34-End - Inference and Learning, Plate Notation diagram of Gaussian Mixture Models.

If you have watched earlier of my lectures from the playlist, they will help. I try explaining as if I am a learner, to simplify complex concepts. Everything I write in whiteboard, and these are completely FREE lectures to mention.

Link: https://youtu.be/T1uTBtJ7aHU?si=rozXSTjtSqPaaYb5

u/Negative_War_65 — 10 days ago

▲ 4 r/DataScientist+1 crossposts

1 min survey about predictive analytics features in Power BI for my Academic Project, (for everyone)

This is for my MBA Final year project guys, specifically for investors and traders.
Guys please I need 30 responses for my class and I would love if you guys can help me out! <3

Heres the link:- https://docs.google.com/forms/d/1Tw4GlwXO47gfQBJMHsim6CsjBi1FtMqi6Oyq3M9rsMc/

u/UpstairsLuck4490 — 10 days ago