How we built an MLB prop model using Statcast — and where books consistently leave money on the table
Most bettors approach player props the same way books want them to — reacting to recent results, anchoring to season averages, and guessing on matchups they haven't actually researched. This post is about why that process finds almost no edge, and what the data actually says instead.
Why books misprice props at scale
On a full 15-game MLB slate, books are setting 300–400 individual player props. Game lines get precise models and fast sharp corrections. Props don't. The volume makes it structurally impossible to price every prop with the same precision applied to the main line — especially for mid-slate games and less-followed prop types like hits and RBI.
The mispricing isn't random. It clusters in three specific conditions:
Recency bias. Books anchor on recent results. A batter going 1-for-18 gets his hit line dropped even if his xBA, Barrel%, and hard-hit rate haven't moved. The book is reacting to outcomes. The data is telling a different story.
Platoon lag. Most books use blended season splits. A left-handed batter facing a right-handed pitcher tonight has a meaningfully different true hit probability than his season average against all pitchers suggests. The gap between blended and split-specific is often 25–40 points of batting average — that's the entire margin on a prop line.
Workload blindspots. A pitcher with a 9.0 K/9 on a strikeout line priced for 6.5 Ks is a bad bet if his team is pulling him at 85 pitches. That workload pattern is in the data. The book's line often doesn't reflect it.
The metrics that actually predict prop outcomes
For hit props — xBA over batting average. xBA (Expected Batting Average) calculates what a batter's average should be based on exit velocity, launch angle, and spray direction against historical outcomes on balls hit the same way. A batter hitting .231 with a .298 xBA is making contact the scoreboard doesn't reflect yet. That gap closes. The book anchors on .231. The model sees .298.
For strikeout props — SwStr% over K/9. Swinging Strike Rate measures swing-and-miss generation per pitch thrown, independent of whether those misses converted to strikeouts. It leads K/9 by 2–3 starts when a pitcher's stuff is improving or declining. A pitcher with elite SwStr% and a suppressed K rate is due for a spike. K/9 misses this entirely.
For home run props — Barrel% × park factor interaction. Barrel% (the optimal exit velocity + launch angle combination) is the cleanest measure of true power. Pair it with the park's HR factor and the opposing pitcher's hard-hit rate allowed, and you have a model input the book's generalized HR line doesn't fully capture.
How we score it
Each of those inputs feeds into an EdgeScore from 0–100. The score isn't a win probability — it's a measure of how strongly the Statcast data diverges from the book's implied probability. High score means the inputs are stacked in the bettor's favour and the line hasn't caught up.
The EV layer sits on top. Once the true probability is modelled using a Poisson distribution for counting stats, we strip the book's vig and calculate the exact gap between true probability and implied probability. That output — LineCheck — tells you whether the edge is real before you place.
A prop with a true over probability of 61% offered at -115 (53.5% implied after vig) is +7.5% EV. That's the signal. Everything else is noise.
What this looks like in practice
The dashboard ranks every MLB prop across the full slate by EdgeScore in one view. The hover breakdown shows which specific factors are driving the score — so you know whether you're betting a strong platoon advantage, elite contact quality, or a pitcher workload mismatch. The Parlay Builder stacks the highest EV legs automatically.
This is what we built at ProprStats. The Statcast model does the research. You see the reasoning, not just the output.
Happy to go deep on any part of the methodology in the comments — the Poisson modelling, the platoon weighting, or how we handle small-sample noise on the xBA inputs.