Transparency
How good is the model, honestly?
Every probability on this site comes from one Elo + Poisson model. Rather than ask you to trust it, here is how it scores on a leakage-free walk-forward backtest over 5,143 historical internationals — measured with the same proper scoring rules the forecasting field uses. The constants are fitted to minimise log-loss, not hand-tuned.
Punishes confident wrong calls hardest. Lower is better.
Like Brier, but credits being close in the home→draw→away order. Lower is better.
How often the model's top pick was the actual result. Higher is better.
Average gap between stated probability and what actually happened. 0% = perfectly calibrated; lower is better.
Reliability curve
predicted vs observedEach dot is a probability bucket: x = what the model said, y = what actually happened. A perfectly calibrated model sits on the diagonal. Dot size = number of predictions in the bucket.
| Predicted probability band | Mean predicted | Observed frequency | Predictions |
|---|---|---|---|
| 0–10% | 5.1% | 6.5% | 1,454 |
| 10–20% | 15.4% | 15.5% | 2,205 |
| 20–30% | 25.9% | 25.8% | 4,539 |
| 30–40% | 33.3% | 31.7% | 2,948 |
| 40–50% | 44.9% | 45.5% | 1,319 |
| 50–60% | 54.8% | 54.9% | 1,077 |
| 60–70% | 64.7% | 65.2% | 811 |
| 70–80% | 74.8% | 77.2% | 593 |
| 80–90% | 84.5% | 84.5% | 328 |
| 90–100% | 93.3% | 94.8% | 155 |
Fitted vs hand-set
held-out validationThe constants were fitted by minimising training log-loss; these are the scores on data the fit never saw. Fitting improves every metric — it isn't overfitting.
| Metric (validation) | Hand-set | Fitted |
|---|---|---|
| Log-loss | 0.877 | 0.865 |
| Brier | 0.516 | 0.508 |
| RPS | 0.173 | 0.169 |
| Accuracy | 59.8% | 60.4% |
| ECE | 2.5% | 1.7% |
Fitted constants
| Constant | Hand-set | Fitted |
|---|---|---|
| K (rating volatility) | 32 | 100.94 |
| Home advantage (Elo) | 65 | 97.25 |
| Elo → goal-diff | 0.004 | 0.0023 |
| Goals baseline | 2.55 | 2.90 |
| Dixon-Coles ρ | -0.05 | -0.15 |
Tournament 2026 — live
model vs actual resultsAs 2026 matches finish, the model's pre-kickoff predictions are scored against the actual results here — the honest, ongoing proof it stays calibrated on the tournament itself, not just the historical backtest above. Nothing graded yet; the first matches kick off 11 June.