r/Sabermetrics

Looking for baseball enthusiasts and data analysts interested in amateur sports data challenges

Influenced by the ideas behind Moneyball and the analytical work of people like Tom Tango, I believe US amateur baseball has real potential for data-driven analysis.

The data is obviously much smaller and more uneven than MLB data, but that does not make it worthless.

I have been working on this for about three years. Currently I have about 14,000 single plays, which is nothing compared to MLB. Still, it is astonishing how reality and calculation match again and again and confirm each other — not only in lineup optimization, but also in wRC+, wOBA, and the overall values.

I would be glad to continue the exchange with anyone who is interested in amateur baseball data challenges, whether from a baseball or data-analysis perspective.

reddit.com

u/Spee11RA — 4 hours ago

▲ 30 r/Sabermetrics+2 crossposts

I built a searchable Summer League stats database for draft fans

With Summer League starting up soon, I wanted to share something I built for other draft/Summer League junkies.

I run nbadraft.app, and I recently added a Summer League stats section because I’ve always felt like detailed Summer League coverage is a weird hole on the internet. Box scores exist in scattered places, but I’ve never found a good way to search across players, games, seasons, teams, and advanced stats like you can easily do for other basketball events.

Main Summer League page:
https://nbadraft.app/stats/summer-league

Explorer/filter page for player, game, and season searches:
https://nbadraft.app/stats/summer-league/explorer?subject=players

Example game page with team/player shot charts:
https://nbadraft.app/stats/summer-league/2022/games/1147

The goal is to make it easier to answer questions like: who was the most statistically prolific player in LVSL last year, how does a prospect’s performance today compare historically, and how does a player’s Summer League production stack up against other lottery picks at the same position?

For example, you can filter last year’s LVSL by position, minutes, usage, shooting efficiency, or compare a lottery guard’s Summer League output against past lottery guards.

I’ll be updating it daily once games start. Would genuinely appreciate feedback, feature requests, bug reports, or anything that would make it more useful for people here.

reddit.com

u/jonathanbechtel — 3 days ago

▲ 73 r/Sabermetrics+7 crossposts

Interesting MLB changes proposed. What is the ripple effect on college baseball and player development?

Source: https://www.mlbplayers.com/press-releases/mlbpa-makes-transaction-proposals-to-benefit-all-players-and-build-upon-industry-momentum

u/Flat-Eggplant-9890 — 4 days ago

▲ 15 r/Sabermetrics

Foster Griffin is an arm to keep an eye on throughout the second half of the MLB Season

TL;DR: Foster Griffin is quietly poised for a second-half breakout for the Nationals, skyrocketing to the #3 ranked qualified MLB starter over the last 30 days (up from #82). The underlying metric shift? He didn't alter his pitch shapes—he optimized his arsenal usage by cutting back on his fastball and throwing more curveballs. Data breakdown below.

Context & Performance: Foster Griffin, a 30-year-old lefty for the Washington Nationals, has had a great year so far, but let's unpack why he might be an X-Factor for the Nationals in a potential second-half playoff push, or for fantasy managers looking to pick up some more firepower in their starting rotation.

Foster Griffin has a 100.4 Composite Score for the year 2026 on Breakfast Baseball, placing him in the top 50 (#48) for qualified starting pitchers on the year. He averages:

Stuff+ and Predictive Stuff+: 101.5
Command+: 108.3
Performance Plus: 106.9

(All numbers that indicate being slightly above average, but over his last 10 starts, Griffin has made the case for being a second-half breakout star.)

The Arsenal Usage Adjustment:

Starting with his Stuff+, Griffin has improved dramatically since his outing on June 5th, 2026, where he went 5 innings of 1-run ball.
Since that start, Griffin has elevated his Stuff+ to sit around the 108–110 mark, which is about 8 points higher than his season average.
This can be attributed not to a change in pitch shapes, but a change in arsenal usage. Over his last 3 games, Griffin has:
- 📉 Reduced his fastball usage from 18% to 15%.
- 📈 Boosted his curveball usage from 10% to 14%.

The Result: Ever since making this arsenal usage adjustment, Griffin has become the #3 ranked qualified MLB starter over the last 30 days, compared to being the #82 ranked starter for the time outside that span.

Do you guys think this level of performance is sustainable? Let me know down below, I'd love to have a conversation about it!

If you like these breakdowns and want more information like this, download Breakfast Baseball, an app that I made! (Coming to the App Store on July 14th)

reddit.com

u/Cool_Bad_2258 — 4 days ago

▲ 12 r/Sabermetrics+1 crossposts

I built an interactive card for MLB standings

I’ve never been satisfied with how standings are typically displayed since it doesn’t clearly show how big the gaps are between teams within a division, nor can you easily tell how the teams in one division are doing compared to teams in another. I created Team Tracker sensors for each team and plotted them based on their current win %. Each division is its own column but everyone is plotted on the same scale so both of my concerns are addressed. I also added things like a progress bar showing the percentage of the season that’s been completed and tapping a team to show its record and win %. Teams currently playing a game glow white, and tapping one of them shows who they’re playing and the current status of the game. Very satisfied with the result. Disclaimer: I used ChatGPT to code this project.

u/bootliar — 5 days ago

▲ 8 r/Sabermetrics

New live logging workflow demo – looking for your feedback

Hi everyone,

I've put together a short demo showcasing several live logging workflows, including:

Automatic Play-by-Play generation
Automatic Box Score updates
Challenge reversals
Correcting previously logged events without disrupting the event chain

The game footage shown in this video is used exclusively for testing and demonstration purposes. All rights to the original game footage remain with their respective owners.

I'd really appreciate your thoughts on the workflow. Is there anything you would handle differently or any features you'd like to see?

u/LegitimateAdvice1841 — 5 days ago

▲ 14 r/Sabermetrics+2 crossposts

Trying to build a football equivalent of baseball's WAR and struggling to find data sources.

Hey everyone. Background on me: I'm a sports economics professor currently doing a master's in statistics. Football is my main sport, but I've always had a soft spot for baseball and I've always foun the idea of bringing some of baseball's analytical frameworks into football very interesting.

The thing that fascinates me most about baseball analytics is how they've managed to quantify individual player value in a rigorous, reproducible way. Specifically, I'm talking about WAR (Wins Above Replacement).

For those unfamiliar: WAR is a single number that tries to capture how many wins a player contributes to their team compared to a freely available "replacement-level" player. What makes it robust is that it's built from several independent components: batting runs, baserunning, fielding, positional adjustment, and a replacement level baseline, all converted into a common currency of wins. The key insight is that you can decompose a player's total value into distinct, interpretable dimensions.

I want to build something analogous for football. My current thinking is to structure a player's value around three components: an offensive contribution (goal involvement, shot quality, finishing efficiency), a defensive contribution (ball recovery, duels, pressing effectiveness), and a creative/construction one (progressive actions, chance creation, build-up involvement). The weight of each component would vary by position. A striker's value would be dominated by the offensive side, a centre-back's by the defensive one, and midfielders would get a more balanced weighting across all three.

Now here's where I'm stuck: data. I've been going down a rabbit hole trying to find a source that gives me granular per-90 player stats. Things like progressive carries, defensive duels won %, pressures, xG, xA, touches in the box along with minutes played, all in one place and ideally exportable.

FBref used to be the obvious answer, but as most of you probably know, they lost their Opta licence in January 2026 and everything beyond basic stats is gone. I've looked at DataMB, ScoutingStats, Understat, WhoScored, and a few Kaggle datasets, and each one covers part of what I need but not all of it. The consistent problem is missing advanced metrics, or no CSV export on the free tier.

I'm not opposed to paying for something reasonable — something in the €10–30/month range that gives clean, exportable player-level data for the top European leagues would be ideal. Just not in the market for a £3,000/year Wyscout licence.

Two things I'd love your input on. First, data sources, paid or free, that you've actually used and trust for this kind of project. Specifically something with minutes played, advanced per-90 metrics, and CSV export for the Premier League or top 5 leagues. Second, your honest opinion on the project itself. Does a positional WAR framework make sense in football given how interdependent everything is? What would you do differently?

Thanks in advance. Happy to share more of the methodology if there's interest.

Also — not a bot, I promise. Sorry if this reads a bit stiff, English isn't my first language.

reddit.com

u/Round_Acanthaceae223 — 7 days ago

▲ 0 r/Sabermetrics

Built a public, graded MLB projection model (hits/TB/HR/K) — tracking every pick's accuracy openly, AMA

Been building a statistical projection model for MLB hitting/pitching

stats over the last several weeks — hits, total bases, home runs,

strikeouts — adjusted for park factors, weather, platoon splits (vs-hand

splits), and opposing pitcher quality, with empirical-Bayes shrinkage for

small-sample players.

The part I think is actually interesting from a methodology standpoint:

every projection gets logged and graded against the real outcome

afterward, nothing removed in hindsight. 1,534 graded so far:

- HR projections: 89.5% hit rate

- Total bases: 68.7%

- Hits: 66.7%

- Strikeouts: still rough, only 12 graded, being upfront that it's weak

Happy to get into the methodology, what's underperforming, or critique

the approach — genuinely looking for sabermetrics-minded feedback, not

just promoting it. Site's at propyard.net/track-record if anyone wants to

see the raw graded history.

reddit.com

u/PropYardApp — 7 days ago

▲ 12 r/Sabermetrics

Will dead zone pitch shapes eventually be good?

I do not know if this question counts as sabermetrics or not so im sorry if it isn’t.

My question is, will pitch shapes that are currently dead zone eventually be good?

A dead zone pitch shape from what I’ve heard is what you do not want as a pitcher. A dead zone pitch shape is a pitch that’s induced movement and stuff is completely average Joe and not unique in any way. Having a non dead zone pitch shape can make a pitch play better (for example a fastball with tons of vert) and the inverse is true.

obviously teams do not want their pitchers to have dead zoney pitches, as those are what hitters are most used to and what hitters mash. teams mess around with grips and stuff to get pitches out of the dead zone. The thing is, what if teams find good grips and other cues to get so many pitchers out of the dead zone that a new dead zone forms? would what is currently a dead zone pitch shape in real life, become super successful in this hypothetical scenario where the dead zone changes?

basically my question is that do dead zone pitches not succeed because of some sort of characteristic that gives a hitter more time or a better angle or something or does the current dead zone not work because hitters are just more used to it?

reddit.com

u/Tacorover — 7 days ago

▲ 199 r/Sabermetrics+2 crossposts

After ABS and replay review, Bobby Cox's ejection record may be the most untouchable record in sports

youtube.com

u/inception47 — 11 days ago

▲ 12 r/Sabermetrics

PhD in stat modelling field. Where to start with baseball?

Basically the title. I have a PhD in a statistical modelling/quant field. I use mostly Stata/R, so I assume learning Python more in-depth is important. But on the substance side of thing, any good starting places for a big baseball fan with this background?

reddit.com

u/Severe-Clerk-1477 — 13 days ago

▲ 5 r/Sabermetrics+1 crossposts

Building an MLB Home Run Prediction Model (260k+ Historical Records) – Looking for Feedback

I've been teaching myself sports analytics and machine learning by building an MLB home run prediction model from scratch in Python and MySQL.

Current version:

~260,000 historical batter-game records
XGBoost classifier
Daily automated pipeline
Predicts probability of a player hitting a home run in today's games

Current features include:

Hitter Features

HR last 3, 5, 10, 15, and 30 games
Hits last 3, 5, 10, 15, and 30 games
AVG, OBP, SLG, OPS rolling windows
HR rates over multiple windows

Pitcher Features

HR allowed
HR/9
ERA
WHIP
K/9

Using rolling windows:

Last 3
Last 5
Last 10
Last 15
Last 30

Matchup Features

Batter vs Pitcher history (BvP)
Plate appearances
Hits
Home runs
Strikeouts
Walks

Context Features

Home/Away
Batting order
Probable starting pitcher
Confirmed daily lineups

One challenge I've run into is balancing recent performance against small-sample-size BvP data. Early versions of the model heavily overvalued BvP, so I've been reducing its influence and letting recent HR trends drive more of the prediction.

A few questions for anyone who has worked on similar baseball models:

What features gave you the biggest improvement when predicting home runs?
Did park factors or weather meaningfully improve results?
Have you found Statcast metrics (barrel %, hard-hit %, launch angle, xSLG, etc.) to outperform traditional rolling stats?
Would you treat HR prediction as a pure classification problem, or try to predict expected HR probability another way?

This project started as a learning exercise, but it's turning into a pretty fun sports analytics project. Any feedback is appreciated.

reddit.com

u/Head_Vermicelli_6032 — 13 days ago

▲ 11 r/Sabermetrics

Is there a variant of OPS+ that accounts for the fact that pitchers were batting and more (bad) players were getting ABs in the steroid era pre-universal DH and dragging down the league average OBP and SLG?

For example in 2001 609 guys had ABs in the National League.

In 2025, that number was 351.

In 2025 the top five qualified hitters in the NL’s OPSes averaged together was .928, in 2001, not counting bonds it was around 1.100.

So obviously the low end is dragging down the league average and inflating the OPS+ of the guys from that era.

Is there anything that accounts for this to more accurately compare guys from different eras?

I hope this makes sense.

reddit.com

u/Area51_Spurs — 13 days ago

▲ 12 r/Sabermetrics+2 crossposts

Updated Gameday Page with full pregame info and updates during game

Built out the GameDay page on 3&2 and wanted to show what it actually does, since it's grown into more than a scoreboard, and can be a simple help for previewing and watching games.

Before first pitch you get the full picture: model win probability for both teams, the probable starters with their grades and full season lines (ERA, WHIP, IP, K/9, W-L), and a bullpen rest tracker showing days off and workload for every reliever on both sides. There's also a team rankings comparison that breaks down hitting and pitching across every meaningful category and shows where each team ranks league-wide.

Take tonight's Royals-Rays game as an example. Pre-game you can see the model has Tampa at 54%, the Avila-McClanahan pitching mismatch laid out side by side with full stat lines, and that Tampa has won 13 of 15 ranking categories against Kansas City's 3. Everything you'd want to know before betting or just watching is right there.

Once the game starts, it switches over and updates live — batting lines, pitcher performance, scoring plays, the works, all tracking the game in real time.

Built it because I wanted one page that covers prep and live tracking instead of jumping between five tabs. Check it out at threeandtwobaseball.com/gameday.html

u/threeandtwobaseball — 12 days ago