r/dataanalysis
Just started first “data” gig. Why’s Excel so fun to get into?
I started as customer service with my company, but recently got promoted by the Client Services director to help with locating trends, and also keeping together data for calls for upcoming “outbound call projects.” He mentioned that in our feedback sessions regarding their Salesforce and website upgrades and mentioned the way that I approached certain issues and solutions I proposed, they felt right giving this opportunity to learn something new and be of assistance, behind the scenes. Great opportunity, I also believe I’m gonna be a great benefit. I only used excel for school work so nothing crazy but as soon as I learned what formulas are and how to make the charts look right, adding calculations/formulas to show results, it’s been so fun and interesting learning about how to make the most and how people have made the most of excel. Applying AI to it makes it so much more fun and of course easier. I’ve used ai to teach me formulas and what each component in the formula means. Ive learned to read existing formulas, but have had AI mostly make my formulas for less room for user error. I give it what I think
Up and we go from there. Feels like I’m gonna do great in this job and I look forward to learning more.
CUSTOMER CHURN ANALYSIS
Built an End-to-End Customer Churn Analysis Dashboard focused on identifying customer retention patterns and churn-driving factors.
Key highlights:
• Analyzed 6.4K+ customer records
• Identified a 27% churn rate
• Performed customer segmentation across demographics, tenure, contract type, payment methods, internet services, and geography
• Built interactive KPI dashboards and churn insights visualizations
• Implemented churn prediction workflow using Machine Learning
Tech Stack:
• PostgreSQL
• Python
• Power BI
• Machine Learning
This project helped me strengthen my understanding of:
✅ ETL & data preprocessing
✅ Analytical querying
✅ Business KPI analysis
✅ Dashboard storytelling
✅ Predictive analytics workflows
Looking forward to building more advanced analytics and ML-driven projects 🚀
#PowerBI #Python #PostgreSQL #MachineLearning #DataAnalytics #DataScience #BusinessIntelligence #Analytics #ChurnAnalysis
I’ve been trying to make cleaner, more readable graphs lately and realized most default tools don’t look that great out of the box.
Excel works, but it often ends up looking… basic.
Some tools look better, but take way more effort to learn.
So I’m curious what people actually use in practice:
- what you consistently go back to
- what gives you good results without too much friction
- what you’d recommend to someone who cares about how charts actually look
- Bonus if you’ve switched tools and noticed a big difference.
What’s the most important skill to improve as a beginner in data analysis?
Im learning data analysis and curious which skills professionals feel make the biggest difference early on.
I wanted to check Epstein files, without spending too much time on them. And spent too much time on them
So yeah. AI tool to talk to Epstein and his files
ETL
Good day everyone, I wanted to find out how important is ETL in data analysis? I'm contemplating buying an Azure Data Engineering course in order to learn ETL and Databricks. Is this overkill?
Tableau requirement from scratch
Hey I got tagged to a project at my organisation for a RETAIL client. They need someone to make sense of their data, find patterns, forecast and explain their data to them so they can try new pricing and discounts depending on the geographical location and price profiles.
I've worked in the past as part of the team where most things were already set up and I just got requirements from a BA and created the workbooks.
This client doesn't have that and I'm the only one here who's gonna be creating tableau reports.
Anyone suggest how to start and do this from scratch?
What key points should I consider?
How should I approach the cloud vs server approach?
How do I join and figure out the data they have cause right now all they have is data in some snowflake server and I have to be the person who uses sql to fetch that.
Any suggestions would be really appreciated.
Built a Power BI project analyzing Karnataka MLA election data — looking for feedback and real-world project collaboration
From the feedback i got from reddit,
i have completed my project
Recently completed an end-to-end Power BI + SQL analysis project on Karnataka’s 2023 Assembly Elections.
The project explores:
- Wealth distribution among MLAs
- Criminal case patterns
- Gender representation
- Education profiles
- Age-group trends
Tech stack used:
Power BI, SQL, Excel, Power Query, DAX
What I focused on most:
- Data cleaning and transformation
- Exploratory analysis
- Dashboard storytelling
- SQL-based analytical queries
- Documentation and reporting
Some findings:
- 54.9% of MLAs have criminal cases
- Top 10 MLAs account for ~46% of total declared wealth
- Women representation remains significantly low
- Both rich and poor MLAs have criminal cases No correlation between Wealth and Crime
GitHub Repo:
https://github.com/sameerarhan/karnataka_mla_2023_analysis
Would genuinely appreciate feedback on:
- dashboard storytelling
- analytical depth
- UI/UX improvements
- how to make projects more industry-oriented to get job ready
Also open to collaborating on real-world analytics projects to improve practical exposure and learn from others in the field.
https://google-review-pilot.vercel.app/
Bewertiq — Bielefeld Datenanalyse
How do you define when Silver-layer data is truly ready for analysis in production environments?
In real-world analytics / BI environments, how do you decide when Silver-layer data is ready for downstream analysis?
I understand the standard cleaning steps (null handling, deduplication, type casting, formatting, standardization, etc.), but I’m trying to understand what “production-grade” Silver data actually looks like in practice.
More specifically:
* What data quality checks do you enforce in Silver vs what you intentionally leave for Gold?
* Do you rely on explicit rules (tests, thresholds, data contracts, SLAs), or is it mostly driven by business context and downstream use cases?
* In financial datasets, what are the minimum validations you would never skip before exposing data to analysts or BI consumers?
I’m trying to avoid two extremes:
* over-engineering Silver until it effectively becomes Gold
* under-validating data and pushing unreliable datasets downstream
I’d really appreciate real-world examples or mental models from production environments, especially around how you draw the line between “clean enough” and truly analysis-ready data.
Designing a plotting Dataset for Rust: Balancing Polars support with zero-dependency weight
When building a visualization library in Rust, a classic architectural dilemma emerges: hard-coding Polars as the backend instantly makes the library heavy, slow to compile, and riddled with large dependencies—making it a no-go for lightweight applications. However, sticking purely to native Rust vectors alienates the data science community who live in Polars DataFrames.
For Charton (a rust visualization crate), the goal was to bridge this gap: keep the core plotting Dataset dependency-free, but provide a seamless, opt-in bridge for Polars users.
Instead of embedding Polars into the core, Charton works natively with clean Rust types but offers a load_polars_df!() macro. This allows Polars users to instantly ingest their data frames with zero friction, while keeping the core library dead-lightweight.
Here is how the API handles data ingestion in practice:
use charton::prelude::*;
use polars::prelude::*;
fn main() -> Result<(), Box<dyn std::error::Error>> {
// 1. Create a Polars DataFrame with diverse, high-performance types
let df = df!(
"id" => &[1, 2, 3, 4, 5],
"status" => &["High", "Low", "High", "Medium", "Low"],
"value" => &[Some(1.2), None, Some(5.6), Some(7.8), None],
"date" => Series::new("date".into(), &[19858i32, 19859, 19860, 19861, 19862]).cast(&DataType::Date)?,
"datetime" => Series::new("datetime".into(), &[1715760000000i64, 1715763600000, 1715767200000, 1715770800000, 1715774400000])
.cast(&DataType::Datetime(TimeUnit::Milliseconds, None))?,
"duration" => Series::new("duration".into(), &[3_600_000i64, 7_200_000, 1_800_000, 10_800_000, 5_400_000])
.cast(&DataType::Duration(TimeUnit::Milliseconds))?,
)?;
// 2. Convert to Charton dataset seamlessly via macro (Polars remains optional at compile time)
let ds = load_polars_df!(df)?;
// 3. Dataset is now ready for encoding-axis binding and layout transformations
println!("{:?}", ds);
Ok(())
}
Charton ensures strict metadata alignment during conversion. The following table illustrates how Polars logical types map to Charton physical storage:
| Polars Logical Type | Charton Physical Type | Notes |
|---|---|---|
Int8, Int16, Int32, Int64 |
i8, i16, i32, i64 |
Direct physical mapping. |
UInt32, UInt64 |
u32, u64 |
Direct physical mapping. |
Float32, Float64 |
f32, f64 |
NaN values are treated as Nulls. |
Boolean |
bool |
Mapped to nullable boolean vector. |
Utf8 / String |
String |
Stored as nullable string vectors. |
Categorical(_, _), Enum(_, _) |
Categorical |
Preserves dictionary encoding + validity. |
Date |
Date |
Stored as i32 days since Unix epoch. |
Time |
Time |
Stored as i64 nanoseconds since midnight. |
Datetime(unit, _) |
Datetime |
Normalized to i64 nanoseconds since Unix epoch. |
Duration(unit) |
Duration |
Normalized to i64 nanoseconds. |
Curious to hear how other library authors tackle the "heavy data frame dependency vs. lightweight core" problem in Rust and hope it helps for everyone who are facing this dilemma.
What 42,715 messages over 9 years look like when turned into motion
Been experimenting with a new messaging-data visualization for Mimoto, my self-built tool for analyzing messaging history.
This version uses Metal to render particle animations from iMessage chat data.
Each particle represents a message. Particle size is based on a weighted “chat points” system rather than raw message count, while particle speed is influenced by response time (the animation here is sped up).
The goal was to visualize how conversation dynamics and energy balance between two people evolve over time.
The weighting model factors in things like:
- message type (text, image, video, voice note, URL)
- fast replies
- long-gap reach-outs
- conversation initiations
- double messages
- laughs, compliments, apologies, questions, and other language signals
Still trying to figure out what this type of visualization should actually be called, so ideas are welcome.
Have Millions of pieces of Data, wondering what next steps are
I have been working on collecting raw data from kalshi markets and now have about a week worth of data I collected. I want to do some ML or make a bot, was wondering if anyone had advice for next steps as I am new to the space
DataPallas - One modern (self hosted) data platform replacing Looker, Tableau, and Crystal Reports
github.comLooking for workbook/textbook/readings
I'd like to work in data analytics but want to make sure my foundation is solid. Would love some book recommendations, preferably one with practice questions but okay if not if its a really good book
Recommendations for data cleaning
Hi
I just done my final uni project on analytics
I used python for cleaning
There were multiple data sets were involved (some are 1.8+million rows)
I have done my analysis and reviews and recommendations
The only thing I regretted is that i haven't cleaned data properly because the entire data is too messy and given in "raw txt" format by professor
Whatever i do with cleaning still some mistakes were
So i all want to ask you is
Suggest some youtube tutorials and books for me to improve data cleaning
And also which other software should i learn other than python for cleaning data
Transforming NASA's asteroid data into [MIDI] in real-time
Through the use of NASA’s API and TouchDesigner, I’ve managed to capture near-earth space objects data [asteroids and fireballs], and used it to trigger MIDI signals in Ableton Live. Said signals are feeding a stock sampler, which happens to be loaded with a couple of one of my favorite artist’s vocal takes.
To give it a little more musicality to the experiment, I decided to iterate through the data of the last six objects that passed close to Earth between the selected dates.
Data goes as follows:
total-radiated-energy → C1 to B1
impact-energy → C1 to B1, C2 to B2 and C3 to B3
latitude → C2 to B2
longitude → C1 to B1 and C3 to B3
altitude → beat repeat’s interval, grid and gate
velocity → beat repeat’s offset and variations
More experiments, through my YouTube channel, project files available through the Tools Store.
Looking for advice for a system
I am looking for a free, open-source free would be great, for a dms, or cms. Possibly something completely different. I'm not sure "what" type of software system I should be looking for and researching.
I am looking for a software system to digitize our paper system. Specifically, our promissory notes.
I would like to be able to scan the paper P.N. into a jpg (directly into the system isn't necessary). Then, be able to create multiple fields to enter the data from the PN. Fields like: PN Number, Ticket Number, Date, Type of PN, Name, ID Number, Address, Phone Number, Total Charge, Amount Paid, Amount Due, Total Due With Admin Fee, Vehicle License Plate, and Vehicle Description. When a PN is generated we charge a 35.00 admin fee. Sometimes we waive that fee. I would like the system to be able to handle subtracting the 35.00 for the record. Then when the customer pay off the PN, I need a way to bring that total to 0.00 and show as PN paid off.
I would also like the system to be able to have the ability to show PN's Due and are Active. PN's that have been sent to collections. And be able to keep records of the PN that have been paid and the date they were paid. Also, a preferred function would be a way to separate by years. When 2026 ends, we can export all the data to a file of some sort that can be searched and read if needed later. And the current "batch" would be 2027, hopefully that makes sense. If the system keeps them all sequential but contains a way to sort by year, that would be acceptable.
There should also be a way to search anything that has been typed. I also want the ability to attach media (images and videos) to each PN record. I would also need to be able to create and print reports. Reports would be all PNs for a month. All PN due and active for a timeframe. All PNs that have been paid, or paid in a timeframe. A report for all Ticket numbers between Tickets 00000001 - 10000000. All PNs with the same name. All that type of data manipulation.
Also, I would need to be able to have users with a login/password, and the ability to assign different levels of access. Like Read-only, Editor, Admin. Those type of levels. Also, needs to be able to be stored on a shared drive. Webserver is not available. Needs to be able to open the system from a shared drive on the network.
I know I have written a lot. If anyone could recommend the type of system I should be looking at (i.e. CMS, DMS, or whatever else). And if you know of any good free, or open source free, software with the type of system I should be looking at, I would appreciate the pointing in the right direction.
Thank you for anyone that takes the time to read through this and help me out.
Does your work feel at all meaningful, and what industry are you in?
I'm in a data analyst job where my boss cancels all our projects partway through and I am miserable.