u/Agalex97

Roma-Lazio prediction follow-up: forecast was ≥ 6 weighted cards, actual was 9 (5Y + 2R). Italian derby pattern holds [OC]

Roma-Lazio prediction follow-up: forecast was ≥ 6 weighted cards, actual was 9 (5Y + 2R). Italian derby pattern holds [OC]

Two days ago I posted a writeup on Italian derby patterns here, with a specific prediction for yesterday's Roma-Lazio (original linked in the first comment). The match is now done — here's the comparison.

Predicted: ≥ 6 weighted cards (yellows + 2×reds counted as 2). The rest of the metrics — shots, xG, goals — left open because of high single-match variance across the historical series.

Actual result: 2-0 Roma. 5 yellow cards + 2 red cards = 9 weighted. Upper end of the historical 8-derby distribution.

Updated series: with this 9th observation, the weighted-cards distribution becomes [5, 6, 6, 6, 7, 7, 8, 9, 11], median 7. The "≥ 6 weighted" criterion now holds in 8 of 9 observations across 5 seasons.

Metrics I didn't predict:

  • Total goals: 2 (historical derby mean: 1.88)
  • Total xG: 1.91 (historical mean: 1.45)
  • Total shots: 22 (historical mean: 21.4)
  • Total fouls: 32 (historical mean: 28.75)

Offensive metrics ended slightly above the matchup's historical means but still well below Serie A baselines (2.61 goals, 2.51 xG, 25.2 shots). The compression pattern that defines this derby vs European peers stays intact.

Pattern check: the broader observation — Italian derbies compress offensive output, with Lazio-Roma as the extreme case across the top 5 European leagues — holds with the 9th data point. The Roma derby stays in the same high-cards, low-goals quadrant of the scatter.

A methodological question I'm taking into the next post: for matchups with a recurring structural identity like this one, how far does a simple "historical mean of past observations" go as a predictor? The weighted-cards series here ([5, 6, 6, 6, 7, 7, 8, 9, 11]) has stayed remarkably tight across 5 seasons, but offensive metrics in the same series remain wildly noisy. I'm planning a follow-up to test this empirically across different matchups and metrics, to figure out when a simple historical baseline is sufficient and when a more elaborate model actually adds value. Curious if anyone here has worked on similar "when does simple beat complex" problems in sports prediction — references or counter-examples welcome.

Thanks to everyone who engaged with the original writeup.

u/Agalex97 — 4 days ago

Roma-Lazio prediction follow-up: forecast was ≥ 6 weighted cards, actual was 9 (5Y + 2R). Italian derby pattern holds

Two days ago I posted a writeup on Italian derby patterns here, with a specific prediction for yesterday's Roma-Lazio. The match is now done — here's the comparison.

Predicted: ≥ 6 weighted cards (yellows + 2×reds counted as 2). The rest of the metrics — shots, xG, goals — left open because of high single-match variance across the historical series.

Actual result: 2-0 Roma. 5 yellow cards + 2 red cards = 9 weighted. Upper end of the historical 8-derby distribution.

Updated series: with this 9th observation, the weighted-cards distribution becomes [5, 6, 6, 6, 7, 7, 8, 9, 11], median 7. The "≥ 6 weighted" criterion now holds in 8 of 9 observations across 5 seasons.

Metrics I didn't predict:

  • Total goals: 2 (historical derby mean: 1.88)
  • Total xG: 1.91 (historical mean: 1.45)
  • Total shots: 22 (historical mean: 21.4)
  • Total fouls: 32 (historical mean: 28.75)

Offensive metrics ended slightly above the matchup's historical means but still well below Serie A baselines (2.61 goals, 2.51 xG, 25.2 shots). The compression pattern that defines this derby vs European peers stays intact.

Pattern check: the broader observation — Italian derbies compress offensive output, with Lazio-Roma as the extreme case across the top 5 European leagues — holds with the 9th data point. The Roma derby stays in the same high-cards, low-goals quadrant of the scatter.

A methodological question I'm taking into the next post: for matchups with a recurring structural identity like this one, how far does a simple "historical mean of past observations" go as a predictor? The weighted-cards series here ([5, 6, 6, 6, 7, 7, 8, 9, 11]) has stayed remarkably tight across 5 seasons, but offensive metrics in the same series remain wildly noisy. I'm planning a follow-up to test this empirically across different matchups and metrics, to figure out when a simple historical baseline is sufficient and when a more elaborate model actually adds value. Curious if anyone here has worked on similar "when does simple beat complex" problems in sports prediction — references or counter-examples welcome.

Thanks to everyone who engaged with the original writeup.

reddit.com
u/Agalex97 — 4 days ago

Roma-Lazio: a unique derby in European football. Here's what to expect from the tomorrow's match according to data [OC]

Building on the football prediction model I've been writing about (~20,000 matches over 5 seasons, top 5 European leagues), I recently ran a derby-effects analysis on the dataset. The pattern that came out doesn't match how most people talk about Italian derbies — and one specific match this weekend is the extreme example.

I'll walk through the comparison, the within-Italy breakdown, and end with a specific prediction.

The European pattern

I aggregated the canonical big match list for each of the top 5 leagues across 5 seasons and computed Δ% vs the league-season baseline. The data:

League Yellows Reds xG Goals
Premier League +17.7% +50.3% +0.4% +2.2%
La Liga +11.9% +28.6% +3.7% +2.8%
Ligue 1 +7.3% -14.3% +6.3% +0.6%
Serie A +5.9% +25.5% -3.5% -5.1%
Bundesliga -1.2% +5.5% +3.2% +6.5%

Cards columns are noisy across leagues, but offensive metrics tell a clean story. Four out of five leagues see derbies produce more goals and more xG than the baseline. Even when cards spike (Premier reds +50%), the football itself stays elevated — more shots, more chances, more goals. The rivalry energizes the match.

Serie A is the exception. xG drops 3.5%, goals drop 5.1%. Italian derbies systematically compress offensive output instead of amplifying it.

Top derby per league

Going one level deeper, here's each league's flagship matchup over the last 5 seasons:

Derby n Yellows Reds Goals xG
El Clásico (Real - Barcelona) 9 5.89 (+22%) 0.22 4.00 3.93
North West Derby (Man Utd - Liverpool) 9 5.11 (+30%) 0.22 3.67 3.62
Klassiker (Bayern - Dortmund) 10 3.00 (-24%) 0.10 3.90 3.30
Le Classique (PSG - Marseille) 10 2.90 (-23%) 0.40 2.60 3.43
Lazio - Roma 8 6.25 (+47%) 0.38 1.88 1.45

El Clásico and the North West Derby: high cards + high goals. Intensity translates into output.

Klassiker and Le Classique: below-baseline on cards, but goals stay above baseline.

Lazio-Roma: high cards AND low goals. The four other top derbies produce 2.60-4.00 goals/match. Lazio-Roma produces 1.88, less than half of El Clásico and the North West Derby. And the compression isn't just on goals — across the 8 derbies, shots are -15% vs Serie A baseline, xG -42%, goals -28%, while fouls are +14% and cards +47%. The whole offensive side of the match shrinks while the physical and disciplinary side amplifies. It's the only top European derby that shows this combination.

Is this an Italian effect or a Roman effect?

Quick sanity check: I took all 58 non-Roma/Lazio big matches in Serie A — the matchups between Inter, Juventus, Milan, and Napoli over the same 5 seasons. Yellow cards came in at -11.5% vs the Serie A baseline. No effect. These are statistically regular Sunday matches.

Roma alone in their big matches (n=48): +24% cards, +35% reds. Lazio (n=47): +23% cards, +50% reds. The two Rome clubs are the carriers of the pattern. When they play each other it compounds: Roma's average goes from 5.27 cards in their other big matches to 6.25 cards in the derby specifically.

The referee

Sunday's referee is Fabio Maresca. He averages 5.4 yellow cards per match in Serie A, above both the league baseline (4.25) and the average Serie A referee. His profile is specific: relatively few foul calls per match, but he books cards quickly once intensity rises. In a match historically running +14% fouls and +47% cards, his designation doesn't dampen the pattern.

The prediction

Most metrics in this matchup are noisy across the 8 derbies. Shot count ranges from 13 to 30. xG total from 0.9 to 2.3. Goals from 0 to 4. Too much single-match variance to call.

Cards are different. Using weighted card count (yellows + 2×reds), the 8 derbies distribute as: 6, 7, 5, 7, 8, 11, 6, 6.

  • 8 out of 8 derbies had ≥ 5 weighted cards
  • 7 out of 8 had ≥ 6 weighted cards

My call: at least 6 weighted cards on Sunday (yellows + 2×reds counted as 2 each). The rest of the metrics — shots, xG, goals — I'm leaving open, the variance is too high.

Happy to discuss methodology in the comments — the baseline normalization, the canonical derby list, the prediction framework, or anything else.

reddit.com
u/Agalex97 — 6 days ago

Roma-Lazio: a unique derby in European football. Here's what to expect from the tomorrow's match according to data [OC]

Building on the football prediction model I've been writing about (~20,000 matches over 5 seasons, top 5 European leagues), I recently ran a big-match-effects analysis on the dataset. The pattern that came out doesn't match how most people talk about Italian derbies — and one specific match this weekend is the extreme example.

I'll walk through the comparison, the within-Italy breakdown, and end with a specific prediction.

The European pattern

I aggregated the canonical big match list for each of the top 5 leagues across 5 seasons and computed Δ% vs the league-season baseline. The data:

League Yellows Reds xG Goals
Premier League +17.7% +50.3% +0.4% +2.2%
La Liga +11.9% +28.6% +3.7% +2.8%
Ligue 1 +7.3% -14.3% +6.3% +0.6%
Serie A +5.9% +25.5% -3.5% -5.1%
Bundesliga -1.2% +5.5% +3.2% +6.5%

Cards columns are noisy across leagues, but offensive metrics tell a clean story. Four out of five leagues see derbies produce more goals and more xG than the baseline. Even when cards spike (Premier reds +50%), the football itself stays elevated — more shots, more chances, more goals. The rivalry energizes the match.

Serie A is the exception. xG drops 3.5%, goals drop 5.1%. Italian derbies systematically compress offensive output instead of amplifying it.

Top derby per league

Going one level deeper, here's each league's flagship matchup over the last 5 seasons:

Derby n Yellows Reds Goals xG
El Clásico (Real - Barcelona) 9 5.89 (+22%) 0.22 4.00 3.93
North West Derby (Man Utd - Liverpool) 9 5.11 (+30%) 0.22 3.67 3.62
Klassiker (Bayern - Dortmund) 10 3.00 (-24%) 0.10 3.90 3.30
Le Classique (PSG - Marseille) 10 2.90 (-23%) 0.40 2.60 3.43
Lazio - Roma 8 6.25 (+47%) 0.38 1.88 1.45

El Clásico and the North West Derby: high cards + high goals. Intensity translates into output.

Klassiker and Le Classique: below-baseline on cards, but goals stay above baseline.

Lazio-Roma: high cards AND low goals. The four other top derbies produce 2.60-4.00 goals/match. Lazio-Roma produces 1.88, less than half of El Clásico and the North West Derby. And the compression isn't just on goals — across the 8 derbies, shots are -15% vs Serie A baseline, xG -42%, goals -28%, while fouls are +14% and cards +47%. The whole offensive side of the match shrinks while the physical and disciplinary side amplifies. It's the only top European derby that shows this combination.

Is this an Italian effect or a Roman effect?

Quick sanity check: I took all 58 non-Roma/Lazio big matches in Serie A — the matchups between Inter, Juventus, Milan, and Napoli over the same 5 seasons. Yellow cards came in at -11.5% vs the Serie A baseline. No effect. These are statistically regular Sunday matches.

Roma alone in their big matches (n=48): +24% cards, +35% reds. Lazio (n=47): +23% cards, +50% reds. The two Rome clubs are the carriers of the pattern. When they play each other it compounds: Roma's average goes from 5.27 cards in their other big matches to 6.25 cards in the derby specifically.

The referee

Sunday's referee is Fabio Maresca. He averages 5.4 yellow cards per match in Serie A, above both the league baseline (4.25) and the average Serie A referee. His profile is specific: relatively few foul calls per match, but he books cards quickly once intensity rises. In a match historically running +14% fouls and +47% cards, his designation doesn't dampen the pattern.

The prediction

Most metrics in this matchup are noisy across the 8 derbies. Shot count ranges from 13 to 30. xG total from 0.9 to 2.3. Goals from 0 to 4. Too much single-match variance to call.

Cards are different. Using weighted card count (yellows + 2×reds), the 8 derbies distribute as: 6, 7, 5, 7, 8, 11, 6, 6.

  • 8 out of 8 derbies had ≥ 5 weighted cards
  • 7 out of 8 had ≥ 6 weighted cards

At least 6 weighted cards on Sunday (yellows + 2×reds counted as 2 each) are likely to happen. The rest of the metrics — shots, xG, goals — I'm leaving open, the variance is too high.

Happy to discuss methodology in the comments — the baseline normalization, the canonical derby list, the prediction framework, or anything else.

u/Agalex97 — 6 days ago
▲ 71 r/algobetting+2 crossposts

I built a predictive model for football match stats (shots, corners, fouls) across 20,000 matches. The strongest predictor ended up being ELO from chess. [OC]

For the past few months I've been working on a personal project: a predictive model for per-match football statistics. Not the final score, but the behaviors: how many shots each team will take, corners, fouls, cards. The dataset covers around 20,000 matches across five seasons and the top 5 European leagues.

I started with hundreds of variables: rolling shot averages, foul rates, corner frequencies, home/away splits, opponent profiles. Everything you'd expect. The first results were decent, but the model was essentially regressing toward each team's historical mean without any real understanding of match context. It could see that Team A averages 14 shots and Team B averages 11, but it had no concept of the gap between the two sides. It didn't know that tonight Team A is so much stronger they'll pin Team B in their own half for 70 minutes and probably end up with 19 shots while Team B scrapes together 6.

Historical averages are built against opponents of all quality levels. They encode nothing about the specific match being played, and that contextual read is exactly what every football fan processes automatically before kick-off. The hard part is giving a model a number for something so intuitive.

I ended up turning to chess. ELO ratings were invented in the 1960s by Arpad Elo to classify players more precisely than tournament standings alone. Beat someone stronger and your score rises significantly; lose to someone weaker and it drops. It updates after every game, with the only inputs being the result and the relative strength of the two players — no performance quality, no expected goals, just who won and against whom.

I built an ELO system for all clubs across the top 5 leagues, initialized from external sources and updated match by match through five seasons. When I added the ELO gap between the two teams as a predictor, things shifted immediately.

Bivariate Spearman correlation with shots:

Predictor Correlation
ELO gap 0.377
Rolling shot average 0.273

The chess number outperformed every football-specific variable in the model. And when you break it down by bucket, it's obvious why:

ELO gap Avg shots
< −200 (much weaker) 9.2
−200 to −100 10.5
−100 to −50 11.0
±50 (balanced) 12.8
+50 to +100 13.0
+100 to +200 14.4
&gt; +200 (much stronger) 17.4

Global average: 12.7 shots

From 9.2 to 17.4 driven entirely by the strength gap — and no rolling average captures it, because rolling averages don't know who those shots were taken against. A team that faced three weak sides in a row will have inflated numbers; the ELO gap adjusts for that automatically.

200 variables, five years of data, six leagues, and the most important feature had nothing to do with football.

Happy to get into the methodology or the initialization choices in the comments.

u/Agalex97 — 10 days ago