r/dataanalysis

Just started first “data” gig. Why’s Excel so fun to get into?

I started as customer service with my company, but recently got promoted by the Client Services director to help with locating trends, and also keeping together data for calls for upcoming “outbound call projects.” He mentioned that in our feedback sessions regarding their Salesforce and website upgrades and mentioned the way that I approached certain issues and solutions I proposed, they felt right giving this opportunity to learn something new and be of assistance, behind the scenes. Great opportunity, I also believe I’m gonna be a great benefit. I only used excel for school work so nothing crazy but as soon as I learned what formulas are and how to make the charts look right, adding calculations/formulas to show results, it’s been so fun and interesting learning about how to make the most and how people have made the most of excel. Applying AI to it makes it so much more fun and of course easier. I’ve used ai to teach me formulas and what each component in the formula means. Ive learned to read existing formulas, but have had AI mostly make my formulas for less room for user error. I give it what I think
Up and we go from there. Feels like I’m gonna do great in this job and I look forward to learning more.

reddit.com
u/Critical-Tennis1897 — 24 hours ago

CUSTOMER CHURN ANALYSIS

Built an End-to-End Customer Churn Analysis Dashboard focused on identifying customer retention patterns and churn-driving factors.

Key highlights:
• Analyzed 6.4K+ customer records
• Identified a 27% churn rate
• Performed customer segmentation across demographics, tenure, contract type, payment methods, internet services, and geography
• Built interactive KPI dashboards and churn insights visualizations
• Implemented churn prediction workflow using Machine Learning

Tech Stack:
• PostgreSQL
• Python
• Power BI
• Machine Learning

This project helped me strengthen my understanding of:
✅ ETL & data preprocessing
✅ Analytical querying
✅ Business KPI analysis
✅ Dashboard storytelling
✅ Predictive analytics workflows

Looking forward to building more advanced analytics and ML-driven projects 🚀

#PowerBI #Python #PostgreSQL #MachineLearning #DataAnalytics #DataScience #BusinessIntelligence #Analytics #ChurnAnalysis

u/Worldly-Welder2033 — 1 day ago
▲ 2 r/dataanalysis+3 crossposts

I’ve been trying to make cleaner, more readable graphs lately and realized most default tools don’t look that great out of the box.

Excel works, but it often ends up looking… basic.

Some tools look better, but take way more effort to learn.

So I’m curious what people actually use in practice:

  • what you consistently go back to
  • what gives you good results without too much friction
  • what you’d recommend to someone who cares about how charts actually look
  • Bonus if you’ve switched tools and noticed a big difference.
reddit.com
u/Open-Ease685 — 1 day ago

What’s the most important skill to improve as a beginner in data analysis?

Im learning data analysis and curious which skills professionals feel make the biggest difference early on.

reddit.com
u/Effective_Ocelot_445 — 2 days ago
▲ 24 r/dataanalysis+14 crossposts

I wanted to check Epstein files, without spending too much time on them. And spent too much time on them

So yeah. AI tool to talk to Epstein and his files

youtu.be

ETL

Good day everyone, I wanted to find out how important is ETL in data analysis? I'm contemplating buying an Azure Data Engineering course in order to learn ETL and Databricks. Is this overkill?

reddit.com

Tableau requirement from scratch

Hey I got tagged to a project at my organisation for a RETAIL client. They need someone to make sense of their data, find patterns, forecast and explain their data to them so they can try new pricing and discounts depending on the geographical location and price profiles.

I've worked in the past as part of the team where most things were already set up and I just got requirements from a BA and created the workbooks.

This client doesn't have that and I'm the only one here who's gonna be creating tableau reports.

Anyone suggest how to start and do this from scratch?

What key points should I consider?

How should I approach the cloud vs server approach?

How do I join and figure out the data they have cause right now all they have is data in some snowflake server and I have to be the person who uses sql to fetch that.

Any suggestions would be really appreciated.

reddit.com
u/ubermensch221 — 1 day ago
▲ 10 r/dataanalysis+1 crossposts

Built a Power BI project analyzing Karnataka MLA election data — looking for feedback and real-world project collaboration

From the feedback i got from reddit,
i have completed my project

Recently completed an end-to-end Power BI + SQL analysis project on Karnataka’s 2023 Assembly Elections.

The project explores:

  • Wealth distribution among MLAs
  • Criminal case patterns
  • Gender representation
  • Education profiles
  • Age-group trends

Tech stack used:
Power BI, SQL, Excel, Power Query, DAX

What I focused on most:

  • Data cleaning and transformation
  • Exploratory analysis
  • Dashboard storytelling
  • SQL-based analytical queries
  • Documentation and reporting

Some findings:

  • 54.9% of MLAs have criminal cases
  • Top 10 MLAs account for ~46% of total declared wealth
  • Women representation remains significantly low
  • Both rich and poor MLAs have criminal cases No correlation between Wealth and Crime

GitHub Repo:
https://github.com/sameerarhan/karnataka_mla_2023_analysis

Would genuinely appreciate feedback on:

  • dashboard storytelling
  • analytical depth
  • UI/UX improvements
  • how to make projects more industry-oriented to get job ready

Also open to collaborating on real-world analytics projects to improve practical exposure and learn from others in the field.

u/FerretLow4499 — 1 day ago

How do you define when Silver-layer data is truly ready for analysis in production environments?

In real-world analytics / BI environments, how do you decide when Silver-layer data is ready for downstream analysis?

I understand the standard cleaning steps (null handling, deduplication, type casting, formatting, standardization, etc.), but I’m trying to understand what “production-grade” Silver data actually looks like in practice.

More specifically:

* What data quality checks do you enforce in Silver vs what you intentionally leave for Gold?
* Do you rely on explicit rules (tests, thresholds, data contracts, SLAs), or is it mostly driven by business context and downstream use cases?
* In financial datasets, what are the minimum validations you would never skip before exposing data to analysts or BI consumers?

I’m trying to avoid two extremes:

* over-engineering Silver until it effectively becomes Gold
* under-validating data and pushing unreliable datasets downstream

I’d really appreciate real-world examples or mental models from production environments, especially around how you draw the line between “clean enough” and truly analysis-ready data.

reddit.com
u/Santiagohs-23 — 3 days ago
▲ 25 r/dataanalysis+1 crossposts

Designing a plotting Dataset for Rust: Balancing Polars support with zero-dependency weight

When building a visualization library in Rust, a classic architectural dilemma emerges: hard-coding Polars as the backend instantly makes the library heavy, slow to compile, and riddled with large dependencies—making it a no-go for lightweight applications. However, sticking purely to native Rust vectors alienates the data science community who live in Polars DataFrames.

For Charton (a rust visualization crate), the goal was to bridge this gap: keep the core plotting Dataset dependency-free, but provide a seamless, opt-in bridge for Polars users.

Instead of embedding Polars into the core, Charton works natively with clean Rust types but offers a load_polars_df!() macro. This allows Polars users to instantly ingest their data frames with zero friction, while keeping the core library dead-lightweight.

Here is how the API handles data ingestion in practice:

use charton::prelude::*;
use polars::prelude::*;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // 1. Create a Polars DataFrame with diverse, high-performance types
    let df = df!(
        "id" => &[1, 2, 3, 4, 5],
        "status" => &["High", "Low", "High", "Medium", "Low"],
        "value" => &[Some(1.2), None, Some(5.6), Some(7.8), None],
        "date" => Series::new("date".into(), &[19858i32, 19859, 19860, 19861, 19862]).cast(&DataType::Date)?, 
        "datetime" => Series::new("datetime".into(), &[1715760000000i64, 1715763600000, 1715767200000, 1715770800000, 1715774400000])
            .cast(&DataType::Datetime(TimeUnit::Milliseconds, None))?,
        "duration" => Series::new("duration".into(), &[3_600_000i64, 7_200_000, 1_800_000, 10_800_000, 5_400_000])
            .cast(&DataType::Duration(TimeUnit::Milliseconds))?,
    )?;

    // 2. Convert to Charton dataset seamlessly via macro (Polars remains optional at compile time)
    let ds = load_polars_df!(df)?;
    
    // 3. Dataset is now ready for encoding-axis binding and layout transformations
    println!("{:?}", ds);

    Ok(())
}

Charton ensures strict metadata alignment during conversion. The following table illustrates how Polars logical types map to Charton physical storage:

Polars Logical Type Charton Physical Type Notes
Int8, Int16, Int32, Int64 i8, i16, i32, i64 Direct physical mapping.
UInt32, UInt64 u32, u64 Direct physical mapping.
Float32, Float64 f32, f64 NaN values are treated as Nulls.
Boolean bool Mapped to nullable boolean vector.
Utf8 / String String Stored as nullable string vectors.
Categorical(_, _), Enum(_, _) Categorical Preserves dictionary encoding + validity.
Date Date Stored as i32 days since Unix epoch.
Time Time Stored as i64 nanoseconds since midnight.
Datetime(unit, _) Datetime Normalized to i64 nanoseconds since Unix epoch.
Duration(unit) Duration Normalized to i64 nanoseconds.

Curious to hear how other library authors tackle the "heavy data frame dependency vs. lightweight core" problem in Rust and hope it helps for everyone who are facing this dilemma.

u/Deep-Network1590 — 3 days ago

What 42,715 messages over 9 years look like when turned into motion

Been experimenting with a new messaging-data visualization for Mimoto, my self-built tool for analyzing messaging history.

This version uses Metal to render particle animations from iMessage chat data.

Each particle represents a message. Particle size is based on a weighted “chat points” system rather than raw message count, while particle speed is influenced by response time (the animation here is sped up).

The goal was to visualize how conversation dynamics and energy balance between two people evolve over time.

The weighting model factors in things like:

  • message type (text, image, video, voice note, URL) 
  • fast replies 
  • long-gap reach-outs 
  • conversation initiations 
  • double messages 
  • laughs, compliments, apologies, questions, and other language signals

 

Still trying to figure out what this type of visualization should actually be called, so ideas are welcome.

u/baxi87 — 3 days ago
▲ 1 r/dataanalysis+1 crossposts

Have Millions of pieces of Data, wondering what next steps are

I have been working on collecting raw data from kalshi markets and now have about a week worth of data I collected. I want to do some ML or make a bot, was wondering if anyone had advice for next steps as I am new to the space

reddit.com
u/trev3434 — 3 days ago

Looking for workbook/textbook/readings

I'd like to work in data analytics but want to make sure my foundation is solid. Would love some book recommendations, preferably one with practice questions but okay if not if its a really good book

reddit.com
u/Local_Elderberry6167 — 3 days ago

Recommendations for data cleaning

Hi

I just done my final uni project on analytics

I used python for cleaning

There were multiple data sets were involved (some are 1.8+million rows)

I have done my analysis and reviews and recommendations

The only thing I regretted is that i haven't cleaned data properly because the entire data is too messy and given in "raw txt" format by professor

Whatever i do with cleaning still some mistakes were

So i all want to ask you is

Suggest some youtube tutorials and books for me to improve data cleaning

And also which other software should i learn other than python for cleaning data

reddit.com
u/Dense-Ad8422 — 3 days ago
▲ 52 r/dataanalysis+7 crossposts

Transforming NASA's asteroid data into [MIDI] in real-time

Through the use of NASA’s API and TouchDesigner, I’ve managed to capture near-earth space objects data [asteroids and fireballs], and used it to trigger MIDI signals in Ableton Live. Said signals are feeding a stock sampler, which happens to be loaded with a couple of one of my favorite artist’s vocal takes.

To give it a little more musicality to the experiment, I decided to iterate through the data of the last six objects that passed close to Earth between the selected dates.

Data goes as follows:

total-radiated-energy → C1 to B1
impact-energy → C1 to B1, C2 to B2 and C3 to B3
latitude → C2 to B2
longitude → C1 to B1 and C3 to B3
altitude → beat repeat’s interval, grid and gate
velocity → beat repeat’s offset and variations

More experiments, through my YouTube channel, project files available through the Tools Store.

u/TasTepeler — 4 days ago
▲ 4 r/dataanalysis+1 crossposts

Looking for advice for a system

I am looking for a free, open-source free would be great, for a dms, or cms. Possibly something completely different. I'm not sure "what" type of software system I should be looking for and researching.
I am looking for a software system to digitize our paper system. Specifically, our promissory notes.
I would like to be able to scan the paper P.N. into a jpg (directly into the system isn't necessary). Then, be able to create multiple fields to enter the data from the PN. Fields like: PN Number, Ticket Number, Date, Type of PN, Name, ID Number, Address, Phone Number, Total Charge, Amount Paid, Amount Due, Total Due With Admin Fee, Vehicle License Plate, and Vehicle Description. When a PN is generated we charge a 35.00 admin fee. Sometimes we waive that fee. I would like the system to be able to handle subtracting the 35.00 for the record. Then when the customer pay off the PN, I need a way to bring that total to 0.00 and show as PN paid off.
I would also like the system to be able to have the ability to show PN's Due and are Active. PN's that have been sent to collections. And be able to keep records of the PN that have been paid and the date they were paid. Also, a preferred function would be a way to separate by years. When 2026 ends, we can export all the data to a file of some sort that can be searched and read if needed later. And the current "batch" would be 2027, hopefully that makes sense. If the system keeps them all sequential but contains a way to sort by year, that would be acceptable.
There should also be a way to search anything that has been typed. I also want the ability to attach media (images and videos) to each PN record. I would also need to be able to create and print reports. Reports would be all PNs for a month. All PN due and active for a timeframe. All PNs that have been paid, or paid in a timeframe. A report for all Ticket numbers between Tickets 00000001 - 10000000. All PNs with the same name. All that type of data manipulation.
Also, I would need to be able to have users with a login/password, and the ability to assign different levels of access. Like Read-only, Editor, Admin. Those type of levels. Also, needs to be able to be stored on a shared drive. Webserver is not available. Needs to be able to open the system from a shared drive on the network.
I know I have written a lot. If anyone could recommend the type of system I should be looking at (i.e. CMS, DMS, or whatever else). And if you know of any good free, or open source free, software with the type of system I should be looking at, I would appreciate the pointing in the right direction.
Thank you for anyone that takes the time to read through this and help me out.

reddit.com
u/VagueScorpio — 4 days ago

Does your work feel at all meaningful, and what industry are you in?

I'm in a data analyst job where my boss cancels all our projects partway through and I am miserable.

reddit.com
u/bloodbent — 6 days ago