▲ 50 r/dataanalysis

I deduplicated 53,000 missing-persons reports from Venezuela’s earthquake

I thought the community might find this interesting - I used entity resolution software (disclosure: from my company) to deduplicate the missing persons data from Venezuela and compare it to the list of patients in hospitals.

https://medium.com/tilo-tech/i-deduplicated-53-000-missing-persons-reports-from-venezuelas-earthquake-74f05c37521b

reddit.com

u/major_grooves — 3 days ago

▲ 16 r/dataanalysis

What are the most useful metrics to track when analyzing personal finance data over time?

I recently pulled all my personal finance data into one place: monthly spending by category, savings rate, investment returns, debt paydown progress. Started in Excel but I'm thinking about moving to Python or a simple dashboard eventually.

The problem is I keep secondguessing which metrics actually tell a useful story versus which ones just look interesting but don't drive any real decisions. I track net worth monthly, for example, but I'm not sure that granularity adds value or just creates noise when markets swing around.

For those of you who've done personal finance analysis projects, which metrics did you actually check regularly and act on? And what did you expect to be useful but turned out to be kind of pointless in practice?

Also curious whether you built anything visual or just worked off raw tables. My instinct is that a simple savings rate trend line is genuinely more useful than a fancy dashboard, but maybe I'm wrong.

Would love to hear how others have approached this, especially around choosing the right level of detail without overcomplicating things. There seems to be a real gap between data that's interesting and data that actually changes behavior.

reddit.com

u/BowlBackground6505 — 3 days ago

▲ 58 r/dataanalysis

What's one data analysis project that taught you more than any online course ever did?

Iam looking for ideas from people working in data analytics. Was there a personal, work, or portfolio project that significantly improved your analytical thinking or technical skills? What made that project so valuable compared to learning from courses alone?

reddit.com

u/Effective_Ocelot_445 — 4 days ago

▲ 1 r/dataanalysis+2 crossposts

Most efficient way for AI to read sports data in Excel?

I have every game of MLB baseball by 3 game series, in order of date (see picture). Each season contains around 2400 rows of data, all neatly in order like the picture.

I want to use AI (chatGPT) so analyse the games, but I am still an AI novice.

First of all, is the data neat enough for AI to view and analyse?

Should I use chatGPT Plus, for efficiency?

Any advice will be appreciated thank you.

u/Unknown30056 — 4 days ago

▲ 14 r/dataanalysis

Analysts who use AI to build their own tools - what do you actually make?

Curious how far people are taking this. Beyond using AI to write queries or clean data, is anyone building actual tools with it? Things like:

interactive dashboards or KPI trackers
report generators
small internal apps for the team to use

Or does most of it stay inside Power BI / Tableau / a notebook and never really become a standalone thing?

And if you have built something standalone - what happened next? Did it get shared with the team, or did it just stay on your machine as a one-off?

Genuinely interested in where the line is these days between "AI helped me analyze" and "AI helped me build a thing other people use."

reddit.com

u/Heeelllga — 5 days ago

▲ 1 r/dataanalysis

As a fresher, how can I build domain knowledge and learn to solve business problems?

I'm preparing for a data analyst role and have been learning SQL, Power BI, and Python. One area I'm struggling with is domain knowledge and business thinking.

How did you learn about different business domains (e-commerce, banking, healthcare, etc.)? What resources or approach helped you the most?

Also, when you're given a business problem, how do you approach it? How do you break it down, decide which metrics to analyze, and identify the root cause?

I'd really appreciate any advice or resources that helped you when you were starting out.

reddit.com

u/DataAspirant169 — 4 days ago

▲ 6 r/dataanalysis

[OC] Analysis of which Club teams are winning the World Cup

Fan project tracking Club player contribution at the World cup...

How's your club doing? https://wc26clubff.dan-gur.com/

u/cyphron227 — 3 days ago

▲ 97 r/dataanalysis

What's one data analysis skill you wish you had learned much earlier in your career?

I've noticed that many online courses focus heavily on tools like Excel, SQL, Python, and Power BI, but real-world work often requires skills that aren't emphasized enough. Looking back, what's one data analysis skill, mindset, or habit that made the biggest difference in your career? I'm especially interested in lessons that beginners usually overlook.

reddit.com

u/Effective_Ocelot_445 — 6 days ago

▲ 0 r/dataanalysis

I’ve uploaded my TikTok Comments Analysis project to GitHub

Check the first comment

u/Illustrious_Media_69 — 4 days ago

▲ 33 r/dataanalysis

If anyone is studying data analysis/ science

I'm currently learning python along with that have created study group for like like minded people let me know if you want to join

reddit.com

u/Commercial-Paper749 — 7 days ago

▲ 2 r/dataanalysis

Matching Accounts via Name similarity across two different data sets

Hi all,

What’s the best way to match account names across two datasets when there are no common IDs and the naming conventions differ?

I was planning to use Python with fuzzy matching (using a similarity threshold), but are there any better tools or AI-based solutions you’d recommend for this kind of entity matching/data reconciliation?

Thanks!

reddit.com

u/nath__b — 4 days ago

▲ 0 r/dataanalysis

Looking to subscribe to AI model for preparing dataset

Hello guys, for my research I need to analyze and organize large amounts of data from research papers. I am looking for an AI model that is best fit for this job like putting stuff into excel/spreadsheet nicely and organized in the way we want it.

I tried chatgpt premium and it seems very good, but I'm just wondering if there are any other models that are better for this.

Thank you

reddit.com

u/Spiritual-Welder-535 — 6 days ago

▲ 0 r/dataanalysis

When Power Query takes hours: How I built a zero-setup local SQL tool to query giant 4-8GB CSVs

Hey everyone,

I work as a data analyst for a client with incredibly locked-down security. If you’ve ever worked in this kind of corporate environment, you know the drill: no access to cloud data warehouses, no advanced developer tools, nothing. My entire world is basically restricted to standard Excel and Power BI.

Recently, I hit a massive wall. I had to clean and analyze flat CSV files ranging anywhere from 4GB to 8GB. Trying to open these in Excel is a joke, and waiting for Power Query to crunch through the transformations was taking forever and completely freezing my machine.

Now, I’m not a professional developer by any means, but I was so frustrated with the tool limitations that I decided to see if I could build a lightweight, custom Enterprise SQL Workbench to handle the heavy lifting while keeping everything completely local to respect data integrity and security rules.

The backend is entirely Python-based, but I set it up so that my non-technical colleagues can use it without writing a single line of code. It pairs Streamlit for a clean browser interface with DuckDB for crazy fast, in-memory processing, and the Calamine engine to handle heavy Excel parsing.

What it actually does:

Zero cloud or database setup: Everything runs locally inside an isolated memory sandbox. No servers to configure, and zero data leaves your machine.
Handles massive files instantly: Because DuckDB processes data in columns (vectorized), it slashes through 4–8GB datasets and runs complex analytical queries in less than a second.
Flexible Multi-File Loading: It lets you mount multiple datasets sequentially into your active session. You can either use Direct File Paths (great for instantly mounting huge files without making copies) or just drag and drop via standard Browser Uploads.
Clean Query Editor: It integrates streamlit-ace so you get a proper dark-mode SQL editor right in your browser with syntax highlighting, line numbers, and a sidebar to explore your active table schemas.
Direct-to-Disk Exporting: If a query pulls a massive result set that would crash a browser tab, it uses DuckDB streams to dump the entire output straight back onto your local hard drive as a .csv or .parquet file.
Multi-Sheet Excel Support: It automatically splits and maps multi-sheet workbooks into individual, clean database tables.

The "One-Click" Magic for Colleagues

Since my teammates aren't developers either and don't use GitHub, I bundled the entire setup into a single .bat script launcher.

Now, all they have to do is double-click a desktop icon. The batch script quietly spins up an isolated virtual environment in the background, pulls the latest UI code directly from my GitHub, checks the dependencies, and launches the interface right in their default web browser. The coolest part? If I optimize the code on GitHub, their desktop launcher automatically grabs the update the next time they open it.

Give it a spin and let me know what you think!

I’ve made the repo public so anyone dealing with corporate data constraints can use it. Please feel free to grab the batch file, throw some of your heaviest datasets at it, and test it out for yourself!

Since I'm still learning the development side of things, I would love to hear your thoughts and suggestions:

How does the processing speed feel compared to your usual Excel/Power Query workflows?
Are there any specific SQL features or shortcuts you think I should add next?
Any tips for further optimizing local memory when pushing past 8GB?

Check out the code or grab the script template here: 👉 GitHub Repository:https://github.com/Nikhil-Maske/sql-workbench

Let me know your feedback or if you run into any quirks while testing it!

reddit.com

u/Minimum_Present7282 — 7 days ago

▲ 9 r/dataanalysis

Help about data search tool

Hello i wish some one can help me…
I have sheet with more than 200 product SKUs with names.. I work in a warehouse and it needs to check every product.
There is any way to make an app or other way to only write the product name then it give me product SKU to record it in the warehouse system.

I need it to be in my phone.

reddit.com

u/Sad-Advertising2352 — 8 days ago

▲ 78 r/dataanalysis

What data analysis skill had the biggest impact on your career growth?

Was it SQL, Excel, statistics, data visualization, business understanding, or communication skills? Curious to hear what made the biggest difference in real-world work.

reddit.com

u/Effective_Ocelot_445 — 11 days ago

▲ 13 r/dataanalysis

Is anyone here a data analyst working in the domain of credit , credit risk and banking analytics ?

Have some queries on how to enhance domain knowledge. any materials, books, courses that I could use ?

I come from engineering background, the credit and banking knowledge hinders my ability to come up with better insights.

reddit.com

u/Aggravating_Bed5990 — 10 days ago

▲ 122 r/dataanalysis+6 crossposts

Built a SQL mystery game - can you query the killer?

Solve murders. Master SQL. One query at a time.

Agatha Christie cases. Real suspects. Live SQLite database. You write the queries, you catch the killer.

SELECT s.name, a.location
FROM suspects s
JOIN alibis a ON s.suspect_id = a.suspect_id
WHERE a.time_from &lt;= '23:00'
AND a.location != 'Cabin'
ORDER BY s.name;

That's the kind of query standing between you and the murderer.

No signup. Runs in the browser. → querythemurder.com

Feedback: querythemurder@gmail.com

u/Sensitive-Try-9603 — 12 days ago

▲ 19 r/dataanalysis

Are online data "gurus" actually helping people land jobs or are they mostly just content creators?

There are hundreds of teachers, coaches and mentors across YouTube, LinkedIn etc., but it feels like their real income comes from content creation or course sales, not from any real data work. I am genuinely curious: has anyone actually landed a data role in the last 5 years by following one of these roadmaps, especially without a tech degree and coming from a completely unrelated field?

Right now the whole thing looks like a machine designed to keep people learning forever. It seems like a large share of learners worldwide are essentially the target audience for these online advisors. Would genuinely love to be proven wrong. If you have seen real examples or experienced this yourself, I’d be interested to hear.

reddit.com

u/noble_andre — 12 days ago

▲ 57 r/dataanalysis

What data analysis skill became much more important after you started working professionally?

Iam curious which skills turned out to matter the most in real world projects compared to what is typically taught in courses or bootcamps.

reddit.com

u/Effective_Ocelot_445 — 13 days ago

▲ 12 r/dataanalysis

Where to store my 500k-row SQLite database?

I have a csv file which will be turned to an SQLite database (480k rows). Content: 5 years of real estate transaction statistics. I'll update the database twice a year with fresh data overwrite (I keep it 5 years).

I'll build a one page dashboard that prettyfies all that data with various graphs.

This is a "freemium" feature for very niche users so READ ops count will be limited.

With that context in mind, which simple, easy to use cloud database solution would you recommend? I'm a no coder, and have learned over the past 6 years how databases, backends, frontends work, i just can't write pure code. That's why simple / easy is important.

Thanks for reading.

reddit.com

u/fredkzk — 12 days ago