Deadlock Methodology

How the Deadlock win-prediction models work, what they were trained on, and how accurate they actually are.

No black boxes. No "trust us". Real numbers from the current production models.

TL;DR

  • Baseline model (draft only) — 56.94% accuracy on 206,993 held-out 6v6 matches it never saw in training.
  • Live model (in-match, reads game state) — 70.58% holdout accuracy, climbing from ~59% in the opening minutes to over 80% by mid-game.
  • The Baseline model takes only draft info: the 12 hero IDs of a 6v6 lineup. It knows nothing about souls, levels, or items. The Live model adds the current match state.
  • ~7 percentage points above 50/50 on draft alone is meaningful but modest: Deadlock outcomes are dominated by post-draft play — lane execution, mechanics, fights, and macro. The draft tilts the odds; it doesn't decide the game.

Training data

6v6 matches (training)
1.66M
Public Deadlock matches sourced via the community deadlock-api, used to fit the models.
Held-out test set
206,993
The most recent matches, held back from training, used to compute the headline accuracy / BCE numbers below.
Heroes covered
39
The full current roster. Each gets a learned embedding.

The split is chronological, not random: the model trains on older matches and is evaluated on the newest ones, so the reported accuracy reflects predicting matches from after the training window — the honest setting, not a leak-prone random shuffle. Models are retrained as the meta moves and hot-swapped into production from cloud storage with no redeploy.

Model architecture

A Transformer encoder over the 12 hero slots of a 6v6 lineup. Each slot is an embedding of (hero_id + team_side); the encoder pools to a single sigmoid for win probability. Architecture: 256-d embeddings, 8 attention heads, 4 layers, 1024-d MLP.

Baseline (draft)
Hero + team embedding only. Predicts the outcome from the 6v6 draft before the game starts. Powers the draft predictor, counters, and synergy pages.
Live (in-match)
Fuses the draft with live match state, so its prediction sharpens as the game develops. This is where the real accuracy lives.

Both models are exported to ONNX and run on CPU. New checkpoints are published to cloud storage and the live site polls and hot-swaps them — the same hero data and predictions update without a redeploy.

Current production metrics

ModelAccuracyBCE (lower = better)Holdout sizeInput
Baseline56.94%0.678206,993Draft only (12 heroes)
Live70.58%0.545206,993Draft + match state

How to read these: a classifier that always guesses 50/50 gets 50.00% accuracy and BCE 0.693. The Baseline sits ~7 pp above chance on the draft alone; the Live model reaches 70.6% by also reading the game in progress. Both are measured on the most-recent held-out matches, never seen during training.

Live accuracy over the course of a match

The Live model's accuracy rises as a match develops and more state is known — measured on the held-out set, bucketed by game time:

58.9%
opening minutes
69.1%
early-mid game
77.4%
mid game
83.7%
late-mid game

Note the honest tail: very long games (the rare matches that run deep into the late game) regress back toward ~70%, because games that stay close that long genuinely are close. We don't hide that.

How the counters, synergy & tier stats are computed

The counters, synergy, and tier list pages are direct aggregates over recent public matches, not model output:

  • Counters = a hero's win rate against each opponent on the enemy team (head-to-head, antisymmetric: if A beats B 56%, B beats A 44%).
  • Synergy = the win rate of two heroes on the same team (symmetric: the duo's joint win rate).
  • Every pairing shows its sample size. Pairings with fewer than 100 matches are dropped as too noisy, so the numbers you see are backed by hundreds to tens of thousands of games.
  • Snapshots carry a generated_at timestamp and refresh as new matches land.

Known limitations

  • Draft-only blindness. The Baseline model doesn't know anything that happens after the draft — soul leads, lane outcomes, fights, builds, mechanics. A "57% favored" lineup absolutely loses if it gets outplayed.
  • Young game, shifting meta. Deadlock is new and patches are frequent. Win rates and the model can lag a big balance change until enough post-patch matches accumulate and we retrain.
  • New heroes have less signal. A freshly added hero has few matches; its predictions sit near 50% until data builds up.
  • Long games are genuinely uncertain. As shown above, the Live model is less confident on matches that run very long — by design, because those games really are close.
  • Not financial advice. This is an analytical / educational tool. Win rates here are not betting recommendations, and we do not endorse or facilitate sports betting.

How is this different from a stats tracker?

  • Most Deadlock tools are stat trackers: they show you what already happened in historical aggregates.
  • Batru is a trained model: give it a 6v6 draft and it predicts the outcome, with a probability you can read against the held-out accuracy above — plus a Live model that updates mid-match.
  • The head-to-head and duo pages are built from that same pipeline, every pairing backed by its real sample size and refreshed as the meta moves.