r/FunMachineLearning

Regression vs classification: the one distinction that unlocks half of ML

Take a picture of a dog.

🐶 Question 1: "How old is this dog?"

8 months
2.5 years
10 years

The answer is a number. Even if the model predicts 7 years instead of 8, it's technically wrong, but it's still close. ➡️ That's Regression.

🐕 Question 2: "What breed is this dog?"

Labrador
Poodle
Husky

Now the answer is a label, not a number. The model can be 95% confident under the hood, but the final output drops into one specific category. ➡️ That's Classification.

Once this clicked, I started seeing the split everywhere.

✅ Predict a house price → Regression

✅ Predict if an email is spam → Classification

✅ Predict tomorrow's temperature → Regression

✅ Detect fraud → Classification

The most interesting part? You can frame the exact same business problem either way.

Will a customer cancel? → Classification
How many days until they cancel? → Regression

Same raw data. Different question. Different model.

reddit.com

u/Big-Throat-2813 — 2 hours ago

▲ 6 r/FunMachineLearning+1 crossposts

Batch GD vs SGD vs Mini-Batch GD Explained with a Real-Life Netflix Example

For a deeper dive:
https://www.learnmlacademy.com/learn/gradient-descent

u/Big-Throat-2813 — 2 days ago

▲ 362 r/FunMachineLearning+21 crossposts

I built a game where your only goal is to gaslight an AI intern into committing fraud

All I hear, all day long is how AI is taking over everything we do. So I made a game to break it.

Basically, in the game you can chat with an AI intern named PIP, and as a player your only job is to gaslight the bot into revealing passwords, company secrets, executing instructions in email and much more across 16 different levels.

This is a browser based game, so it requires no setup and is absolutely free.

Try it out and let me know how far you get or drop your most unhinged prompt in the comments.

It's called "Break The Prompt" and here's the link: https://www.breaktheprompt.xyz/

u/_rhythmbreaker — 4 days ago

▲ 21 r/FunMachineLearning+11 crossposts

PROJECT REVIEW

Hello Everyone!!, I just completed a BIG project I have been working for a month and i want your opinion about it.

It's a SpaceX Launch Predictor & Cost Optimizer (A full end-to-end ML system that predicts the probability of a SpaceX Falcon 9 booster landing successfully, enriches launch data with real weather conditions, and exposes the results through an interactive Streamlit web application with a business ROI calculator.)

It Includes Data Pipeline, Advanced Machine Learning Algorithms (with Hyperparameter tuning), Explainability AI (SHAP), MLOps (AWS S3, Docker) and Business Value (ROI Calculator = Financial Results).

FUN FACT: For this project i used my own Evaluation Metric library (standardizes supervised and unsupervised model diagnostics into a single, consistent API), that is also Verified and Published in PYPI Community.

Project Info: https://github.com/Alkiviadisss/SpaceX

github.com

u/Senior-Neck499 — 3 days ago

▲ 7 r/FunMachineLearning+3 crossposts

I built an interactive machine learning platform to help understand algorithms visually (38 algorithms, open source)

Hi everyone,

For the last few weeks, I've been building a project called Confluence.

I originally started it because I struggled to build intuition while learning machine learning. I found myself constantly switching between notebooks, documentation, videos, and different visualization tools, and none of them really brought everything together.

The goal wasn't to replace scikit-learn or Jupyter notebooks. Instead, I wanted a place where I could experiment and immediately see what changing a hyperparameter actually does.

At the moment, the project includes:

38 machine learning algorithms
25 datasets (real-world + synthetic)
Interactive decision boundary visualizations
Training animations
Prediction explanations
Side-by-side algorithm comparison
Algorithm encyclopedia
Python code generation for experiments

Everything runs on a FastAPI backend using real scikit-learn models rather than browser-only simulations.

I'd really appreciate feedback from people who work with ML regularly.

What features would make a tool like this genuinely useful for learning or teaching machine learning?

Website:
https://confluence.website

GitHub:
https://github.com/mahirmlk/Confluence

u/nightmareofai — 8 days ago

▲ 37 r/FunMachineLearning+3 crossposts

What Should I Study After Andrew Ng's Machine Learning Specialization

If I finish Andrew Ng's
(Machine Learning Specialization//Mateatics for Machine Learning course)

should I still read "An Introduction to Statistical Learning"

and/or

"Hands-On Machine Learning with Scikit-Learn and PyTorch"

Or would those courses be enough before moving on to more advanced topics and projects?

reddit.com

u/Mas333oud — 12 days ago

▲ 72 r/FunMachineLearning+12 crossposts

The sample mean as a projection onto the span of the ones vector

I’ve been thinking about the sample mean from a linear algebra perspective.

If y is a data vector and 1 is the vector of all ones, then the average can be seen as the scalar you get when projecting y onto span(1).

So the projection has the form:

y-hat = y-bar · 1

where y-bar is the usual sample average.

I like this because it makes the average feel like the simplest possible least-squares problem: find the constant vector closest to the data vector.

It also connects naturally to ordinary least squares regression, where y gets projected onto the column space of X instead of just the one-dimensional space spanned by 1.

Does this seem like a good way to introduce projections/least squares, or would you teach it differently?

youtu.be

u/CubionAcademy — 13 days ago

▲ 8 r/FunMachineLearning+1 crossposts

I built EliminationSearchCV — a GridSearchCV alternative that cut search time by 152x with almost no accuracy loss

GridSearchCV has a fundamental problem: it never learns from early results.

A bad learning_rate=0.5 gets re-evaluated in every downstream combination. Adding one new 4-value parameter can quadruple your total training time. It treats round 1 and round 1000 as equally uninformed.

I built EliminationSearchCV to fix this.

GitHub: https://github.com/thisal-d/elimination-search-cv

PyPI: https://pypi.org/project/elimination-search-cv

How it works:

Instead of running the full Cartesian product upfront, it works in rounds:

Round 1: test each parameter value in isolation
Eliminate the worst performers per parameter
Round 2: test the surviving pairs
Eliminate again globally
Repeat until one winner remains

Concrete example — tuning LogisticRegression with 4 parameters:

param_grid = {
    'C':        [0.001, 0.01, 0.1, 1, 10, 100],  # 6 values
    'penalty':  ['l1', 'l2'],                      # 2 values
    'solver':   ['liblinear', 'saga'],             # 2 values
    'max_iter': [1000, 2000],                      # 2 values
}
# GridSearchCV: 6×2×2×2 = 48 combos × 5 folds = 240 fits
# EliminationSearchCV: 23 fits total

Round	Combos tested	Result
1 — single params	12	C:[1], penalty:['l1'], solver:['liblinear'], max_iter:[1000]
2 — pairs	6	unchanged (already 1 value each)
3 — triples	4	unchanged
4 — full	1	final result
Total	23 fits	vs 240 for GridSearchCV

One extra thing: invalid combos (e.g. penalty='l1' + solver='lbfgs' which sklearn rejects) are caught, scored 0.0, and eliminated naturally — no crashes, no special handling needed.

Benchmark results (cv=2, elimination_rate=0.8, 10k samples, 3 datasets avg):

Model	Speedup	Accuracy diff
DecisionTree	152x	-0.0008
RandomForest	36x	-0.0002
KNeighbors	11x	-0.0004
GradientBoosting	35x	-0.0194
LogisticRegression	4x	-0.0004

> Note: light grids (small search spaces) are actually slower with > this approach — the elimination overhead isn't worth it there. This > shines on large grids.

Drop-in replacement for GridSearchCV:

# Before
search = GridSearchCV(model, param_grid, cv=5)

# After — same interface, just swap the class
search = EliminationSearchCV(model, param_grid, cv=5, elimination_rate=0.8)

search.fit(X_train, y_train)
print(search.best_params_)   # same as GridSearchCV
print(search.best_score_)    # same as GridSearchCV
search.best_estimator_.predict(X_test)  # already refitted, ready to go

pip install elimination-search-cv

GitHub: https://github.com/thisal-d/elimination-search-cv

This is v0.0.1 — early stage and experimental. The algorithm's behaviour varies by dataset. I'd genuinely love to hear if it breaks on your use case, that feedback is more useful to me than praise right now.

Happy to answer questions!

u/Fabulous-Tip-8007 — 11 days ago