I built a football analytics platform that goes beyond standard xG to evaluate "deserved" outcomes.
I’ve spent the last few months building numbertwenty.io, a football data platform designed to calculate the true "deserved" outcomes of matches by filtering out the game's "aleatoric" noise. I just wanted to share it to get some fresh eyes and constructive feedback, so I can improve my model / platform.
The problem I'm trying to tackle:
We all know football is inherently chaotic (according to statistics). A single rare event can flip a result, which is why relying solely on the final score might miss the dynamics of a football match. We often use standard Expected Goals (xG) to assess the fair result, but it also has its limits when analyzing a single game such as:
- The "Draw" blindspot: Football has 3 outcomes (1-X-2). But xG models (even Poisson-derived ones) mathematically struggle to predict a draw as the most probable outcome as soon as the xG values aren't perfectly identical.
- Context is ignored: Generating 1.5 xG away from home is inherently harder than doing it at home, but raw xG doesn't capture this dynamic.
- Volume vs. Control: A team spamming low-probability shots can inflate their xG without actually controlling the game.
Then, there is no direct metric to quantify the "Fair Result" of a football match.
The core idea of numbertwenty.io:
To tackle this inverse problem (and cut through the match's aleatoric noise), I use the very simple principle of similarity search (statistical neighbors).
The pipeline compares a match's statistics (derived from raw stats) against thousands of football games (each feature being weighted according to its relevance in the competition!). By finding a match's closest statistical neighbors, and after performing a calibration to match the observed distribution of 1-X-2 in the competition, the model surfaces a realistic probability distribution of what the outcome truly deserved to be.
I detail the whole process a bit more in the about section of the website. The current model is surely not the final version and can evolve over time. I also added a simple predictive algorithm based on the same principle as the post-match analysis, but it's not the main purpose of the website, and I will try to improve it in future updates. I really focus on post-match analysis, which also highlights just how random results can be, and why betting is highly uncertain!
Beyond all of that, I tried to add plenty of other tools on the platform for you to check out, like a dynamic Fair Elo ranking, an automatically generated analysis of football matches according to statistics (experimental)...
This is my first time building and deploying a full-stack platform, so any feedback is welcome!
(Quick note on ads: they are managed automatically by Google with most settings kept to the bare minimum. If you find them too intrusive or if they ruin the UX, please let me know and I will try to adjust them manually).
Here are some screenshots from desktop:
Main menu (Green background = deserved result, Red background= unfair outcome)