Transparency · How this works

Methodology

Everything here is meant to be checkable. Below: how the model turns team strength into match probabilities, how it decides something is a value edge, how every pick is graded honestly against the closing line, and the tamper-evident ledger that timestamps each prediction set. The honest summary up front — the model is well-calibrated and beats a naive baseline, but "beats the market" is not yet proven. That's exactly what the public CLV record is being built to test.

How the model works

Team strength (Elo)

Each team carries an Elo rating trained on thousands of historical internationals. Wins against strong teams move it more than wins against weak ones; home advantage and venue/host context adjust it per match. Games are also weighted by importance — a friendly counts at half the weight of a competitive match, so warm-up exhibitions move the ratings less than a World Cup or qualifier result.

Scoreline matrix (Poisson + Dixon-Coles)

The rating gap becomes each side's expected goals, and a Poisson model gives a probability for every exact scoreline, with a Dixon-Coles correction for low-scoring results.

Markets & calibration

The matrix is summed into 1X2, Over/Under 2.5 and BTTS probabilities. Five model constants are fitted by minimising out-of-sample log-loss, then checked on data the fit never saw — see the calibration report.

Finding value

model vs market

For each market we de-vig the bookmaker prices to fair probabilities, then compute edge = our call − fair. A positive edge means a genuine disagreement, not a sure thing. We de-vig with the power method (fit one exponent so the book's implied probabilities sum to 1) rather than splitting the margin proportionally — the proportional shortcut overstates longshots and understates favourites, the well-known favourite-longshot bias, so the power baseline is closer to realised frequencies. Edges are priced against the best available book (line-shopping is worth real EV), shown with their expected value, and sized with a capped fractional-Kelly stake — the honest signal, since raw EV flatters longshots. See them all on the edges page.

Why "our call", not the raw model. Plain Elo is poorly anchored between confederations that rarely play each other — left alone it rates Germany below Iran and Morocco second in the world. We fix that at the source: a per-confederation anchoring step fits one Elo offset per confederation to the de-vigged market (UEFA up, CONCACAF/AFC/CAF down, mean-zero overall), which roughly halves the model's disagreement with the market and — crucially — also corrects the knockout/outright simulations. On top of that, because no model can fully see squad quality, we publish a confidence-weighted market blend: where odds exist, the anchored model's 1X2 and Over/Under probabilities are pulled toward the de-vigged market, and the further they stray the less the model is trusted (a big disagreement with a sharp 29-book consensus is more often model error than value). Edges are measured from that blend, so a structural error no longer masquerades as value. Each match's Model vs market panel shows the (anchored) model, the market, and the blended call side by side; the divergence tracker ranks the residual gaps.

Grading honestly

the part that matters

Beating base rates is not the same as beating the market. Every logged bet is settled against the result and scored for closing-line value — the taken price vs the de-vigged closing line. The track record shows P/L, ROI and average CLV (with a confidence interval, because the sample is small and the CI is the honest part of the claim). Until that record exists, nothing here should be read as a proven market edge.

Tamper-evident predictions

SHA-256 ledger

Every prediction run writes a SHA-256 hash of the exact predictions.json bytes into an append-only, hash-chained ledger: each entry also records the hash of the entry before it, so altering one historical line breaks every line after it. You can check the current numbers yourself — re-hash the live file and it must match the latest line below (and the sidecar). And because the chain links every set, a past prediction can't be silently rewritten without the chain failing to verify.

What this does and doesn't prove. It lets anyone confirm the published file matches its recorded hash, and that the history hasn't been edited after the fact. It is self-published, though — the ledger lives in this repo, so the strongest possible proof (that the whole chain wasn't regenerated at once) would need an external anchor — committing each hash to a public, append-only place we can't rewrite (e.g. a public git remote or a timestamping service). That external anchoring is on the roadmap; until then, treat this as a self-auditable record, not a third-party-notarised one.

Latest prediction set 6 Jun 2026, 19:23 UTC

Predictions 72

SHA-256 of predictions.json 15006f00d462b1e3e057c347c31d6b8108875456689a79188946da491de8c280

How to verify it yourself

Download the live file: /data/predictions.json
Hash it — macOS/Linux: shasum -a 256 predictions.json; Windows PowerShell: Get-FileHash predictions.json
It must equal the SHA-256 above (and the sidecar /data/predictions.hash.json).
To check the whole history, fetch /data/predictions_ledger.json: each entry's prev must equal the SHA-256 of the previous entry (sorted-keys JSON); the first is all-zeros.

Generated (UTC)	Picks	SHA-256
6 Jun 2026, 19:23 UTC	72	`15006f00d462b1e3…`
6 Jun 2026, 16:38 UTC	72	`dff081324417d798…`
6 Jun 2026, 16:26 UTC	72	`f560ffa5716d82b3…`
6 Jun 2026, 16:25 UTC	72	`aece5336d07e9c3a…`
6 Jun 2026, 14:32 UTC	72	`7614132babe1e172…`
6 Jun 2026, 14:10 UTC	72	`161c2024bd987c43…`
6 Jun 2026, 13:18 UTC	72	`889ef275a23cc1bb…`
6 Jun 2026, 12:38 UTC	72	`0e5ed13624878bbb…`

Showing the 8 most recent of 12 recorded prediction sets.

Glossary

the jargon, plainly

Elo rating: A single strength number per team, updated after every match by the gap between expected and actual result. Higher = stronger.
Poisson / scoreline matrix: Goals are modelled as a Poisson process from each side's expected goals, giving a probability for every exact scoreline — which is summed into 1X2, Over/Under and BTTS.
Dixon-Coles: A correction to the independent-Poisson matrix that fixes the well-known under-counting of 0-0/1-0/0-1/1-1 results in low-scoring games.
De-vig (fair price): Bookmaker odds include a margin (the 'vig'/overround). De-vigging normalises a market's implied probabilities back to sum to 1 so the model is compared against the market's true opinion, not its margin. We use the power method (one fitted exponent), which corrects the favourite-longshot bias better than splitting the margin proportionally.
Edge: Our published call minus the de-vigged market probability for the same outcome. A positive edge means we disagree with the market in your favour — not a guarantee of profit.
Our call (market blend): The probability we publish: the raw Elo+Poisson model pulled toward the de-vigged market on the markets the books price (1X2, Over/Under). The pull is confidence-weighted — the more the model disagrees with the market, the less it's trusted, because the model is mis-scaled across confederations and a big disagreement is usually its error, not value. Knockout rounds have no odds, so they stay pure model.
EV (expected value): Expected return per unit staked at the best available price. Long prices inflate EV, so it's read alongside the Kelly stake, not on its own.
Kelly stake: A bankroll-aware bet size proportional to the edge and the odds. We use a fractional (¼) Kelly, capped, to stay conservative on an uncertain edge.
CLV (closing-line value): How your taken price compares to the de-vigged closing line — the sharpest the market gets. On a small sample, consistent positive CLV is the strongest evidence of real skill.
Calibration / ECE: Whether stated probabilities match reality (of all '60%' calls, ~60% should happen). ECE is the average gap; lower is better.

Honest limitations

Small sample. 104 World Cup matches can't establish a market edge on their own — the CLV record is the test, and it's only just starting.
Minor-nation uncertainty. Ratings for teams with few recent internationals rest on thin data; those predictions are less certain than the single number suggests.
Not advice. Edges are estimates and can be wrong. This is analysis for interest, not betting or financial advice.