r/Sabermetrics

Built a luck detection model for buy low/sell high - May 20 update with new signal layer added
▲ 31 r/Sabermetrics+1 crossposts

Built a luck detection model for buy low/sell high - May 20 update with new signal layer added

Hi All,

If you've seen my previous posts, the current luck model uses seven layers of full-season Statcast data to identify mispriced players (Article link here). It’s done well, with a 91.4% pooled accuracy across four years.  However, with the way that model works, it looks at early season performance and sees if the player returns a value (or a discount) throughout the summer months of baseball (since it takes larger sample sizes to validate these impacts). 

As the current signaling works, after the first 6-8 weeks of a season, there won’t be a ton of material changes to the players. So, rather than measuring where a player has been all season, a recency layer adds another component looking at current trends --more details can be found here if you want to deep dive. I currently only have this done for hitters--next week I'll include pitchers.

With that, here are some callouts for this week!

Buy Low -- Geraldo Perdomo – SS, AZ (SS27, Overall 302)

Look, his barrel rate isn’t exciting, but his profile didn’t have a high barrel rate when he was a ~top 60 ADP.  Also, when you combine his expected stats delta with some of the underlying metrics below, the performance could turn a corner closer to what people drafted him to produce. 

Improvement over past 3 weeks 

  • EV, 79mph --> 86mph
  • Hard Hite Rate, 19% --> 25%
  • Barrel. 0.4% --> 2.4%

His Hard Hit Rate is also up above baseline, and even 3% up over last year where he had his best fantasy season.  His Launch Angle is down, and he’s been hitting more ground balls than his baseline, but hit pull/center rates are up, so if he can address the launch angle, I think it’s a recipe for some solid ROS value.

Sell High -- Otto Lopez – 2B-SS, MIA (SS4, Overall 30)

Lopez is an interesting profile for ROTO, but the truth of the matter is he is outperforming nearly every expected metric.  And this is where the recency layer is compelling.  Again, I get small sample sizes are tough to work around in baseball (the whole purpose of this tool! 😊), but here’s his trends over the past few weeks:

Decline over past 3 weeks

  • EV: 94mph --> 86.5mph
  • Hard Hit Rate: 55.4% --> 34.6%
  • Barrel Rate: 10.7% --> 7.0%

Lastly, yes, you’re not dropping Otto Lopez—I see this as a cash-out opportunity if you do look to sell.  Package to get an upgrade or look to get a ROS Top 35 player in return

Buy, but with a caveat--

Jackson Merrill – OF, SD (OF36, Overall 181)

Merrill has a .261 BABIP that's well below career baseline, and the recency layer confirms the contact quality trend has been actively improving over the last three weeks.  CBS projects him ROS at OF20, and I think that’s easily passable with his talent . However, here's the caveat.  He’s getting torched right now by cutters (and splitters/sliders to a lesser degree).  His cutter’s runs above average per 100 pitches (I know that’s a mouthful) is -7.2 vs. previous seasons of 1.2 and 2.6.  It’s not a holistic breaking ball issue too, as he’s doing fine against sinkers/curves.  It’s possible pitchers have adjusted better to him as he’s entering year 3.  I’ll be monitoring this closely (especially since I have him on a fantasy roster!).

Thanks all for reading!

Dustin

u/Dlovell02 — 1 day ago

How does one get started with creating a retrosheet database on a laptop (with zero coding experience)?

I've long wanted to download all the relevant retrosheet data files and then run statistical questions on them.

But I'm ignorant of coding skills.

Are there any good resources on how to get started or is some level of coding knowledge assumed first?

Thank you

reddit.com
u/sabr-hp — 1 day ago

A statistic I've been working on - would welcome feedback/criticism

Here are the metrics I started with, taken from the Plate Discipline section on Fangraphs:

  • Zone% = Percentage of total pitches in the strike zone.
  • O-Zone% = 1 - Zone%, percentage of total pitches outside the strike zone.
  • Z-Swing% = Percentage of pitches in the strike zone that were swung at.
  • O-Swing% = Percentage of pitches outside the strike zone that were swung at.
  • Z-Take% = 1 - Z-Swing%, percentage of total pitches in the zone that were not swung at.
  • O-Take% = 1 - O-Swing%, percentage of total pitches outside the zone that were not swung at.
  • O-Contact% = Percentage of swings that made contact on pitches outside the strike zone.
  • Z-Contact% = Percentage of swings that made contact on pitches in the strike zone.
  • O-Miss% = 1 - O-Contact%, percentage of swings out of the zone, where contact was not made.
  • Z-Miss% = 1 - Z-Contact%, percentage of swings in the zone, where contact was not made
  • HardHit% = Percentage of batted balls with an exit velocity of 95 MPH or higher.
  • NHH% = 1 - HardHit%, percentage of batted balls with an exit velocity under 95 MPH

After messing around with these numbers for a while (I could probably reproduce the process if anyone is interested), I came up with 8 outcomes for any given pitch:

  1. OSM = Out of zone, swing, miss.
  2. ZSM = In zone, swing, miss.
  3. OT = Out of zone, take.
  4. ZT = In zone, take.
  5. ZSCH = In zone, swing, contact, hard contact
  6. ZSCW = In zone, swing, contact, weak contact.
  7. OSCH = Out of zone, swing, contact, hard contact.
  8. OSCW = Out of zone, swing, contact, weak contact.

Once you have these, and can confirm they account for all outcomes, you simply pick the 5 outcomes that will be the defense's favor, and the 3 desirable outcomes for the offense:

DOOP (Defensive Optimal Outcome Percentage) = OSM, ZSM, ZT, ZSCW, OSCW

BOOP (Batting Optimal Outcome Percentage) = OT, ZSCH, OSCH

I hope this makes sense, any opinion would be welcome!

reddit.com
u/sirsockmouth — 1 day ago

What I learned after 3 months deep-diving into MLB Statcast data — 5 things that surprised me

I've been building a baseball analytics guide using real data from Baseball Savant, FanGraphs, and Baseball-Reference. Here's what genuinely surprised me:

  1. Bobby Witt Jr.'s 2024 season was historically underrated. His 10.4 fWAR was more than double his preseason projection of 4.8, and his 171 wRC+ meant he was 71% better than the average MLB hitter. Traditional coverage barely captured how special it was.

  2. The Astros' pitch tunneling system is more sophisticated than I expected. They don't just optimize spin rate — they use Hawk-Eye data to measure how similar two consecutive pitches look at the 20-foot decision point. Verlander's revival wasn't random.

  3. Catcher framing is worth 2-3 WAR for elite framers. The gap between the best and worst framers in baseball is enormous and most fans have no idea it exists.

  4. The ABS challenge system is already changing how teams prepare. Analytics departments now study individual umpire zone tendencies to decide when to use their challenge — it's become its own analytical problem.

  5. Bobby Witt Jr. aside, the xBA vs BA gap was enormous for several players in 2024. Some guys hitting .230 had .285+ xBA — the market hadn't caught up yet by mid-season.

Happy to go deeper on any of these. What Statcast metrics do you all find most underused or misunderstood?

reddit.com
u/SabermetricsLab — 2 days ago

Looking for advice - is my graph interesting?

I have been trying to share some graphs/stats that I find interesting on my Twitter page (https://x.com/adam\_mur)

I put this up last week as I found it very interesting wrt to Padres. I got 1.3k views but no post interactions.

This post is not a “poor me” post, I am looking to get feedback, is this graph interesting?

u/Soggy_Reporter_1043 — 3 days ago
▲ 92 r/Sabermetrics+3 crossposts

I know all about How Retrosheet Saved Baseball History so AMA

I'm Jay Wigley and I'm excited to have a chance to answer your questions about Retrosheet, and about how this 100% fan-created and non-profit organization has enabled deep knowledge of baseball history.

If you don't know Retrosheet, check out www.retrosheet.org and you'll find yourself in data heaven. All your favorite baseball sites like Fangraphs and Baseball Reference get their historical MLB games--all the play-by-play and box scores--from Retrosheet (with very few exceptions). But how Retrosheet gathered and preserved all those games has been unknown until recently.

My book is How Retrosheet Saved Baseball History (www.retrosheet-book.com) and is based on both my own volunteer work for Retrosheet and hundreds of hours of interviews and five years of research into every facet of Retrosheet.

If you're a baseball fan beyond today's game, if you enjoy any aspect of baseball history, Retrosheet is for you. John Thorn (MLB official historian) has called Retrosheet "the greatest human endeavor since someone convinced 40,000 Jews to build a pyramid."

So ask me questions about Retrosheet or the book-writing process, or anything at all. I'm excited to engage with Reddit's best and largest baseball community. . .

u/Sad_Cryptographer501 — 4 days ago

WAR in an individual game?

How is WAR calculated in an individual game?

Andujar hit a HR and scored the only run in a 1-0 Padres win and yet only had 0.08 WAR. Does one team's offense WAR always match their opponents pitching WAR but negative.

Thanks for your support. I have always followed WAR over seasons but not in individual games.

https://preview.redd.it/8dkvejjg612h1.png?width=1776&format=png&auto=webp&s=7f45a77ffd68872c7670e4b942de4e23bec90498

reddit.com
u/bobbleheader2020 — 3 days ago

Best way to search for reverse splits?

Trying to find seasons of players who have reverse batting splits where they hit a pitcher with the same handedness better then a opposite handed pitcher.
What’s the best way to go about that?

reddit.com
u/GonGon99_27 — 3 days ago

How many outs is a run worth- or what is the question I’m trying to ask? I’m playing MLB the show, and a question came to me, runs are exponentially more valuable than outs- so what’s the equation to find when you *should* be looking for an out?

reddit.com
u/Grand-Way-2663 — 6 days ago
▲ 104 r/Sabermetrics+3 crossposts

MLB division standings display

My GitHub repo fetches live MLB NL West standings via the MLB Stats API and composites them onto a background image with team pennants, W-L records, and games-back figures. The renderer outputs a 960×1280 PNG to a GitHub Pages-hosted public/ folder, making the image accessible over HTTP as a simple static URL. The reTerminal polls that URL on a schedule to refresh the display — no server required.

u/Dave-356w — 8 days ago

Era-translation methodology — z-score for K/HR, additive for BB. Where am I wrong?

EDIT — TLDR for anyone short on time:

I built a baseball sim that uses career-translated player rates to simulate matchups across eras. I'm asking the sub three specific questions:

  1. Is z-score the right method for K-rate translation, or am I missing something about how K rates scale across eras?
  2. Should BB-additive account for league-wide approach shifts (patient era vs swing-happy era), or is the simpler additive model good enough?
  3. Is there a cleaner method than z-score for HR-rate translation given how much the physical conditions (ball, parks) have changed across eras?

Full methodology below if you want the details.


I've been building a baseball sim that lets you draft all-time fantasy rosters and play 162-game seasons, and the hardest engineering problem has been era translation. Posting the approach here to discuss the approach and math.

THE PROBLEM

The 1927 AL hit .285. The 2024 AL hit .243 with the highest K rate in history. A "20 HR season" means something completely different across these contexts. If you want Ruth and Ohtani on the same field, you have to translate them to a common baseline first or the matchups are nonsense.

MY APPROACH

Career rate stats from Baseball Reference, translated to modern (2015-2024) league context using league means and standard deviations from the Lahman database (27,800+ pitcher-seasons, 1871-2024, IP-weighted). Per-stat method chosen for how each stat behaves across eras.

K and HR rates → z-score translation. League K rates have shifted from ~1.5 K/9 in the 1880s to ~8.7 K/9 in the 2020s. League HR rates moved by an even larger factor (0.08 to 1.14 HR/9). A "high strikeout pitcher" of one era is unrecognizable in absolute terms in another. Z-score

preserves where a pitcher ranked within his era's distribution and renders that same rank in modern context. Configurable caps prevent impossible extremes — Nolan Ryan's career K/9 doesn't translate to 14+ even though raw multiplication would push him there.

BB rates → additive translation. Walk rates have stayed in a narrow band (2.5-3.5 BB/9) since 1900. Absolute deviation from era-mean is the natural representation. Pedro's control translates to elite modern control. Nolan Ryan stays wild.

CAREER VS PEAK

Players exist in two pools, both translated the same way:

- Career rates — the default pool. Ruth's career HR rate, not just his 1927 line. Used for most modes.

- Peak-season rates — single year of dominance. 1927 Ruth (60 HR), 1927 Gehrig (47 HR, 175 RBI), 1968 Gibson (1.12 ERA), 2000 Pedro (1.74 ERA, 0.74 WHIP, 11.78 K/9), 2001 Bonds (73 HR). Used when you face named historical teams.

So when you build a career-Bonds roster and play it against the 1927 Yankees, you're playing career Bonds against peak Ruth. Two views, same translation method.

VALIDATION

Translated cards run through a 162-game season against rotating opponent lineups across six quality tiers — rough proxy for real-MLB career conditions. Mean absolute gap between simulated ERA and the ERA implied by each pitcher's career era_plus: 13.9%. About half of all pitchers

fall within 10% of their implied modern ERA. About a quarter within 5%. Sample is 283 pitchers (101 historical, 182 modern).

The remaining gap is real information loss. era_plus aggregates K, BB, HR, defense, park effects, league context, and opponent quality into one number. The translation works on rate stats; the rest can't be perfectly recovered from rate stats alone.

WHERE I THINK I'M WRONG

- Elite-era_plus relievers over-perform — Rivera's career ERA+

translates to a sub-1.00 simulated ERA. The translation itself is probably accurate; the issue is the interaction with usage. In this engine the closer pitches the 9th whenever the SP is pulled, which ends up ~90-100 IP per season — more than real-life closer usage (~60-70 IP) but still less than half a starter's workload. Per-inning dominance doesn't get diluted by exposure the way a starter going 200 IP does, AND the rate-handling math itself compounds dominance against elite hitters across smaller samples. Both effects, not just one.

- Some recent star starters (Cole, Verlander, peak-era_plus Kershaw) under-perform their implied ERA when facing elite lineups. League-leading rates don't fully reproduce real-life dominance in the simulated environment.

I have working theories on these. Curious about others interpretations.

QUESTIONS I DON'T HAVE GOOD ANSWERS TO

- Is z-score the right choice for K rates? Defensible (rank-preserving across eras), but the long-tail extremes (Ryan, Koufax) feel sensitive to where I set the cap.

- For BB additive, am I underweighting how league-wide approach has shifted (3-2 patience era vs swing-early era)? Walks are aggregate of pitcher and hitter approach, not just pitcher.

- HR rate translation — using same z-score method as K, but the underlying physics (ball, parks, hitter strategy) are wildly different across eras. Is there a cleaner method?

WHERE TO TEST

This all runs at playrubbermatch.com — free, no sign up. You can build a roster, play a season in a few minutes, and see where the translation produces results that feel right or feel off. The point isn't to convert anyone to a user — just a place where the math is testable in context, not just in spreadsheets.

Happy to share more info if anyone wants to dig in. Thanks in advance!

reddit.com
u/A-A--Ron — 7 days ago

I investigated 2026's increased walk rate for FanGraphs

https://blogs.fangraphs.com/where-are-2026s-extra-walks-coming-from/

I thought r/sabermetrics would appreciate the methodology in here. It's pretty flexible for other future queries, and there's a GitHub repository at the end if you're interested in duplicating or modifying it. I've seen a lot of Markov chain models for base/out states before, but I hadn't seen a PA-level implementation, and it's a really nice fit in my opinion.

u/Ben_Clemens_FG — 8 days ago

Bootstrap on my first 421 picks: 88% confidence of long-run +ROI, but I'm 42.8% straight up. What am I missing?

Spent the last few months building a probabilistic prediction model for NBA and MLB game outcomes. Standard hobbyist stack: Elo + recent form + injury drag + pitcher-level priors for MLB + line-movement signal + per-sport calibration shrink. Outputs a calibrated p(side wins) for each market.

Yesterday I finally ran proper validation on 421 settled picks and the result is interesting enough I want to ask for methodology critique.

**The headline tension:**

* Raw hit rate: 42.8% (n=421, Wilson 95% CI [38.1%, 47.5%])

* Sounds bad. Standard -110 breakeven is 52.4% so naive read is "model is losing."

* But mean decimal odds taken is 2.94 (model picks a lot of dogs and small parlays), so actual mix breakeven is 42.4%.

* Bootstrap on actual P/L (1000 resamples, 1u stakes): mean ROI +8.6%, 95% CI [-5.4%, +22.4%], P(ROI > 0) = 0.885.

Per sport:

* MLB n=322: hit_rate 44.7%, breakeven 43.9%, bootstrap mean ROI +6.65%, P(>0) = 0.798

* NBA n=94: hit_rate 38.3%, breakeven 37.9%, bootstrap mean ROI +19.94%, P(>0) = 0.851

So the bootstrap is saying long-run +EV is more likely than not, but I'm at the sample size where confidence intervals on ROI still cross zero. The "I'm losing because hit rate is below 50%" naive read is misleading because the bet mix has different breakevens.

**The validation finding (the actual question):**

I bucket every pick into confidence tiers based on (model_p, fanduel_edge). The CLV-aware data on the top tier surprised me:

* Top tier (n=108 settled, 5 with closing-line data): 100% beat the closing line, +21.27pt avg CLV, +24.56% bucket ROI

* Middle tier (n=199, 19 with CLV): 73.7% beat-close, +1.46pt avg CLV, +8.06% ROI

* Auto-parlay tier (n=86): 25% hit, -18.81% ROI. This is broken. Generation thresholds were too loose.

The high-confidence tier is doing real work: 100% beat-close (small sample but consistent direction) plus +21pt CLV says the model is picking the sharper side of the market on its strongest signals. The auto-parlay tier is hemorrhaging because parlay miscalibration compounds multiplicatively while my per-sport calibration shrink is tuned for singles.

**What I'd love methodology feedback on:**

  1. **Per-tier-vs-parlay calibration.** I shrink model_p toward 0.5 based on per-(sport, market_type) historical hit-rate gaps. Singles are well-calibrated. When I multiply N calibrated leg probabilities to get a parlay prob, miscalibration compounds and the parlay prob is consistently overstated. Has anyone solved this cleanly: leg-level Platt scaling tuned specifically for parlay use, hierarchical Bayesian per-leg priors, something else?

  2. **CLV stamping coverage.** I currently have closing-line data on only 24 of 421 settled picks because the snapshot loop wasn't reliably running for the first months. Going forward every new pick gets stamped automatically. Should I weight calibration adjustments toward CLV-validated rows even at small n, or wait for more data?

  3. **Bootstrap interpretation.** With P(ROI > 0) = 0.885 and 95% CI crossing zero, what's the responsible way to communicate this externally? "Probably profitable" feels honest but is harder to falsify than a Sharpe-style number. Curious how people working on similar discrete-outcome prediction systems frame their confidence.

Open-book journal where every pick before kickoff is logged and graded automatically against ESPN's scoreboard. Happy to share the link in a comment if useful for context; not the point of the post.

reddit.com
u/mangoman40114 — 9 days ago
▲ 47 r/Sabermetrics+1 crossposts

Hey everyone,

I recently finished building THE NINE — not just the app, but the full workflow around it — and I’d really appreciate some honest feedback from people who work with game data.

I’m not trying to sell anything here.
I’m trying to answer one question:

Is it immediately clear what this actually does and what it requires?

The problem I’m trying to solve

After a game, everything is scattered:

  • video
  • pitch data (TrackMan / similar)
  • lineup / roster
  • notes, reports, clips

Even for teams that do have data, there’s no clean way to connect everything into one review workflow.

What the system does

You give it:

  • full game video
  • lineup / roster
  • pitch-by-pitch CSV (TrackMan or equivalent)

And it turns that into one structured package:

  • full logged game (pitch-by-pitch)
  • synced video clips
  • play-by-play + box score outputs
  • pitch data exports
  • player reports + review views
  • a read-only review app + portal access

What I’m trying to understand

If you open the site for 30–60 seconds:

👉 Is it clear what the system needs from you?
👉 Is it clear what you get back?
👉 Or does it feel like it requires more than it actually does?

Site: https://the-nine-app.live

I’m especially interested in critical feedback — if something is confusing or feels like overkill, that’s exactly what I need to hear.

Thank you all.

u/LegitimateAdvice1841 — 12 days ago
▲ 13 r/Sabermetrics+1 crossposts

An early look at each qualified hitter's plate discipline (K-BB%) and extra-base hit power (ISO)

u/ritmica — 13 days ago