r/askdatascience

▲ 0 r/askdatascience

What's one Data Science tool you can't work without?

Mine isn't a fancy ML library.

It's Jupyter Notebook because it helps me experiment, document, visualize, and explain my thinking all in one place.

Curious what tool has made the biggest difference in your workflow.

reddit.com

u/Long-Bridge-6512 — 1 day ago

▲ 12 r/askdatascience

Data Science vs Computer Science. Which one for Bachelors?

for my bachelor's degree, which one should i choose? Data Science or Computer Science?

For context: I am very much interested in data science. I have taken online courses (although beginner level), and I feel like this is the field that I want to pursue.

Due to the massive oversaturation in computer science and the fact that most CS graduates that I have talked to claim that 90% of what they learn in CS isn't of any use, I decided to explore specialized subfields. I stumbled upon data science, did my research, did some free beginner-level online courses, and now, for many months I have felt that this is the degree that I want to pursue. But recently, many people have told me that I should pursue computer science for my bachelors and then do MS in data science because cs will provide me with a better base, it is a more respected degree, and i will learn a larger set of skills.

Can anyone pursuing either of these degrees, or who's employed in the data science field, tell me which one I should choose? For someone who is interested in data analytics and ML, which one is better for a BS degree?

reddit.com

u/garbuge — 2 days ago

▲ 19 r/askdatascience+11 crossposts

PROJECT REVIEW

Hello Everyone!!, I just completed a BIG project I have been working for a month and i want your opinion about it.

It's a SpaceX Launch Predictor & Cost Optimizer (A full end-to-end ML system that predicts the probability of a SpaceX Falcon 9 booster landing successfully, enriches launch data with real weather conditions, and exposes the results through an interactive Streamlit web application with a business ROI calculator.)

It Includes Data Pipeline, Advanced Machine Learning Algorithms (with Hyperparameter tuning), Explainability AI (SHAP), MLOps (AWS S3, Docker) and Business Value (ROI Calculator = Financial Results).

FUN FACT: For this project i used my own Evaluation Metric library (standardizes supervised and unsupervised model diagnostics into a single, consistent API), that is also Verified and Published in PYPI Community.

Project Info: https://github.com/Alkiviadisss/SpaceX

github.com

u/Senior-Neck499 — 1 day ago

▲ 47 r/askdatascience+8 crossposts

Analyzed 12,614 Indian AI/Data Science jobs (till May 16) — Azure is rising, SQL beats ML, and consulting firms are quietly dominating AI hiring

Weekly analysis of AI & Data Science job postings from Indian job boards.

Sample size: 12,614 listings (till May 16, 2026).

---

**Top Skills — Full Breakdown:**

| Skill | Mentions |

|--------------------|----------|

| Python | ~2,600 |

| SQL | ~2,400 |

| Machine Learning | ~1,500 |

| Artificial Intelligence | ~1,050 |

| Azure | ~1,000 |

| Java | ~1,000 |

| AWS | ~800 |

| GCP | ~600 |

| Spark | ~600 |

| Data Analysis | ~550 |

---

**Key observations:**

**SQL is basically tied with Python now**

Gap is only 200 jobs. Everyone learns Python first but companies

still need SQL everywhere — pipelines, reporting, analytics layers.

If you skipped SQL thinking it's "old", reconsider.

**Azure quietly entered top 5**

~1,000 mentions. AWS was the default for years but Azure

is catching up fast in Indian enterprise hiring, especially in

BFSI and consulting. Both Azure + AWS together = ~1,800 jobs.

**Consulting firms are the real AI employers**

Top 10 companies hiring AI talent:

| Rank | Company | Jobs |

|------|------------|-------|

| 1 | TCS | ~360 |

| 2 | Accenture | ~340 |

| 3 | Leading Client | ~310 |

| 4 | Infosys | ~150 |

| 5 | EY | ~145 |

| 6 | Capgemini | ~130 |

| 7 | Amazon | ~110 |

| 8 | Databricks | ~105 |

| 9 | CGI | ~105 |

| 10 | IBM | ~100 |

EY and Capgemini in the top 6 is interesting —

Big 4 consulting is aggressively building AI/data practices.

Databricks at #8 means data engineering is very real demand.

"Leading Client" still at #3 = staffing firms hiding actual employers.

**City breakdown (expanded):**

| City | Jobs |

|------------|--------|

| Bengaluru | ~3,000 |

| Hyderabad | ~1,950 |

| Pune | ~1,200 |

| Chennai | ~850 |

| Mumbai | ~850 |

| Gurugram | ~550 |

| Remote | ~500 |

| Noida | ~480 |

Chennai entered the top 4 this time —

mostly TCS/Infosys/Accenture campuses expanding AI teams there.

---

**Takeaway from this week:**

The "learn GenAI or die" crowd is louder than the actual job market.

Real JDs: Python → SQL → cloud (Azure/AWS) → ML fundamentals.

That stack gets you through 80% of listings.

Tracking this weekly at getjobpulse.in if anyone wants the dashboard.

Anyone seeing Azure demand spike in their interviews too?

u/NeitherMembership679 — 3 days ago

▲ 3 r/askdatascience

What’s the right way to clean data?

I’m totally new to DS, and I’m working on my first project.

Should I use data via an API and clean it at the beginning of my script or should I download it as a CSV and then clean it?

Also what’s the best approach for cleaning a dataset?

Just for reference, I’m using the NYC Building Energy and Water Data Disclosure for LL84 2023 to Present database via an API.

reddit.com

u/No-Ice-8975 — 2 days ago

▲ 3 r/askdatascience

What's the biggest misconception about working in data science?

I've noticed that many people think data scientists spend all day building AI models.

For those already in the industry, what's the biggest misconception people have about your job?

reddit.com

u/Long-Bridge-6512 — 3 days ago

▲ 7 r/askdatascience+2 crossposts

How can I match bunch of elements to canonical products which is unknown? (Entity Resolution)

The problem is simple but solution is not. ChatGPT doesn't really give an answer.

What i want is to group the "apple" together, strawberry together in a big corpus of data.

These are also noisy and really different since there is shiny apple, blue apple etc..

And another problem is that i don't have an exact name called "apple", i want the program to find the canonical entities by itself without having an input, it is not a zero-shot thing.

What should i do?

reddit.com

u/Interesting_North293 — 3 days ago

▲ 6 r/askdatascience+4 crossposts

Do I even need a dGPU for CS/Data Science as a freshman?

Starting CS this fall, 300km from home. Got a 5700X + 4060 desktop but can’t bring it. Home only 1-2x a month.

Debating between a budget no-GPU laptop vs one with a dGPU. Main question — for hackathons and local AI agents, do I actually need CUDA or can I just use Colab/cloud GPUs and call it a day?

Was the dGPU on your laptop worth it, or did you end up on Colab anyway?

reddit.com

u/ChapsLair1215 — 6 days ago

▲ 6 r/askdatascience+1 crossposts

Check Out MY Data Science Portfolio - be honest

Here is my project portfolio. i would like to know if I need to improve on my DS projects or if I am headed in the right direction as I look for an entry level job

www.johnkirima.com

Thanks.

reddit.com

u/data_scientist_lover — 5 days ago

▲ 20 r/askdatascience

If you're learning data science, focus on solving problems—not collecting certificates.

One thing I've noticed is that many beginners keep enrolling in course after course but rarely build projects.

My biggest improvement came when I started working on real datasets instead of only watching tutorials.

Even simple projects like sales prediction, customer segmentation, or sentiment analysis taught me more than hours of theory.

Employers and interviewers often ask about how you approached a problem, not how many certificates you earned.

My advice:

Practice Python every day.

Learn SQL well.

Build projects consistently.

Explain your work clearly.

Consistency beats perfection.

What project taught you the most during your learning journey?

reddit.com

u/naga3607 — 5 days ago

▲ 5 r/askdatascience+2 crossposts

Hi all, I am newly certified as a Data Science and have 2 questions (so far) a

I've installed gemma4: e4b locally and would also like to choose a qwen model as well. Any suggestions as my hardware is limited to only 8gb unified RAM on a 2020 MacBook Pro M1?
I am looking to create a few projects to showcase my skills. Open to suggestions that can be pushed to my Git Repo.

Thank you in advance and I love learning. I know it will be long, but I am taking it piece by piece and want to continue until I can upgrade my hardware.

reddit.com

u/Superfly022 — 7 days ago

▲ 1 r/askdatascience

Roadmap

Hello folks,

As I begin my B.Tech journey in Data Science, I am looking for guidance on how to navigate the next four years effectively. Could you please provide a roadmap for a fresher in this field?

It would be very helpful if the roadmap included specific examples of skills to learn, tools to master, and types of projects I should work on at each stage of my studies.

Thank you for your time and for any advice you can share.

reddit.com

u/gaining_insights — 6 days ago

▲ 1 r/askdatascience

Best way to learn pandas

need a advice from seniors

Hey as we know tech landscape has changed much due to this AI boom .

If y'all given a chance to do pandas again how would you do it, keeping in mind all the factors.

I am following correy Schafer 's playlist. Yes I'll try my best to do alot of practice o

What advice would y'all give me as I have just finished my freshman year at my bachelor's in mathematics and data science.

Would be very thankful to you 🫂

reddit.com

u/NaiveManagement6817 — 6 days ago

▲ 5 r/askdatascience

please help me out

I want to become a data analyst then continue going deep in it and become a data scientist

I want to start preparing for the interview as my last year starts from sep so can you please tell me like as in Data analyst field python is a good language

So for the preparation of the data structure for the interview can I prepare the topics in the python language???? Or should I do it in c++ or java

reddit.com

u/DirectorSlow8577 — 7 days ago

▲ 17 r/askdatascience

My biggest mistake while learning Data Science

When I started learning Data Science, I spent months watching tutorials and collecting courses. I felt productive, but I wasn't building anything.

Everything changed when I started working on real datasets. Cleaning messy data taught me more than any course ever could. Building projects exposed gaps in my knowledge and forced me to learn practical solutions.

If you're starting out, spend less time collecting resources and more time solving problems.

What project helped you learn the most?

reddit.com

u/Long-Bridge-6512 — 8 days ago

▲ 4 r/askdatascience+1 crossposts

I have recently enrolled in a Data Science program and am currently learning Python. My goal is to build a career in AI/ML or Data Science. I'm trying to understand what employers actually expect from entry-level candidates. Is it possible to get into AI/ML without having strong software engineering

reddit.com

u/Disastrous_Fun4900 — 7 days ago

▲ 28 r/askdatascience

The biggest surprise in my Data Science journey

When I started learning Data Science, I thought machine learning models were everything.

Now I spend more time understanding business problems and cleaning data than building models.

Sometimes a simple dashboard answers the business question better than a complex model.

I wish someone had told me this when I started.

For experienced data scientists here:

What's one thing beginners focus on too much?

reddit.com

u/Long-Bridge-6512 — 9 days ago

▲ 122 r/askdatascience+6 crossposts

Built a SQL mystery game - can you query the killer?

Solve murders. Master SQL. One query at a time.

Agatha Christie cases. Real suspects. Live SQLite database. You write the queries, you catch the killer.

SELECT s.name, a.location
FROM suspects s
JOIN alibis a ON s.suspect_id = a.suspect_id
WHERE a.time_from &lt;= '23:00'
AND a.location != 'Cabin'
ORDER BY s.name;

That's the kind of query standing between you and the murderer.

No signup. Runs in the browser. → querythemurder.com

Feedback: querythemurder@gmail.com

u/Sensitive-Try-9603 — 12 days ago

▲ 16 r/askdatascience

How do you visualize higher dimensional data?

I am working with a project, wherin i have to visualize a dataset with many dimensions.

I am stuck with 2d if i dont use any techniques.

I need practical advice, so as to what techniques, libraries to use to visualize.
and the dataset has a lot too many dimensions, to use just one technique(like a heatmap, where the R, G, B and could stand for another dimensions).

Also, i am using Python, and R.

u/Fresh-Lie5160 — 10 days ago

▲ 7 r/askdatascience+1 crossposts

Recently transitioned into a Data Scientist role. Planning to prepare for overseas opportunities in the next year. Looking for advice and study partners.

Hi everyone,

I recently transitioned into a Data Scientist role, and I'm planning to stay in this role for about a year while building my skills. My goal is to start applying for Data Scientist positions abroad after that.

I want to make the most of this one year and prepare properly. For those who've already made this transition or landed international roles, what should I focus on?

Some things I'm thinking about are:

DSA & coding interviews (LeetCode)

Machine Learning fundamentals

Deep Learning

SQL

System Design for ML

GenAI/LLMs

MLOps

Building strong end-to-end projects

Am I missing anything? What would you prioritize if you had one year to prepare?

Also, if anyone else is on a similar journey and wants an accountability partner or study group, feel free to comment or DM me. It would be great to prepare together and keep each other motivated.

Thanks in advance!

reddit.com

u/KeyDelivery6751 — 9 days ago