Methodology - How Batru works

Draft Pro Plus Live BP Sim

TL;DR

Standard model — 55.19% accuracy on 561,855 held-out ranked matches. Calibration is excellent (ECE = 0.0037).
Plus model (role-aware, knows position 1-5) — 55.28% accuracy.
Pro model (player-aware) — 54.52% accuracy on professional matches.
The model takes only draft info: 10 hero IDs (+ roles or player IDs for the advanced models). It knows nothing about in-game gold, levels, or items.
~5 percentage points above 50/50 may sound small, but Dota outcomes are dominated by post-draft variance: individual mechanics, lane execution, fights, macro decisions. The draft typically accounts for ~7-12 pp of expected win rate.

Training data

Public ranked matches

~19M

Steam Web API + OpenDota /publicMatches (rank-filtered). Used to train the Standard and Plus models.

Held-out test set

561,855

Most-recent matches held back from training, used to compute the headline accuracy / BCE / calibration numbers below.

Pro matches

~18,400

OpenDota proMatches + Stratz, used to train the Pro model with player identity awareness.

Data refreshes daily from Steam Web API, OpenDota, and Stratz. Each new model checkpoint is evaluated against the held-out set and only promoted to production if it beats the current champion on BCE without regressing on accuracy. The promotion log is public via our internal eval history.

Model architecture

A small Transformer encoder over the 10 hero slots. Each slot is an embedding of (hero_id + team_side); the encoder produces a CLS token that goes through an MLP into a single sigmoid for Radiant win probability.

Standard

Hero + team embedding only. Trained on ranked. Best general-purpose model.

Plus (role-aware)

Adds position 1-5 to each slot as a residual feature. Transfer-init from Standard; learns the role delta on top.

Pro (player-aware)

Adds player ID per slot. Pre-trained on public ranked, fine-tuned on pro matches with player identity.

All three models are exported to ONNX and run on CPU. Inference for a full BP lookahead tree (~100 candidate evaluations) takes under 200 ms.

Latest production metrics

Model	Accuracy	BCE (lower = better)	Brier	ECE (calibration)	Holdout size
Standard	55.19%	0.685	0.246	0.0037	561,855
Plus	55.28%	0.685	0.246	0.0134	3,345 pro
Pro	54.52%	0.685	0.246	0.0255	3,345 pro

How to read these: A binary classifier that always predicts 50/50 gets 50.00% accuracy and BCE 0.693. Our models sit ~5 pp above chance and below the naive BCE -- modest but meaningful. The standout number is ECE 0.0037 on the Standard model: when we say 60%, the actual outcome is 60% (averaged over many such predictions). That's the property serious users care about.

Live accuracy proof

Want to see the model called against actual matches you can verify? The /live page always shows the model's predictions for the most recent finished pro matches with the actual result alongside. Predictions are computed pre-game from the draft only -- the model has not seen the result.

No cherry-picking. The list is the raw OpenDota proMatches feed in reverse-chronological order. If we're wrong, you'll see it.

Known limitations

Draft-only signal. The model doesn't know anything that happens after the draft -- gold leads, lane outcomes, fight execution, item choice, individual mechanics. A "55% favored" team can absolutely lose if they get outplayed in lane or throw a fight.
New heroes have less signal. Heroes added in the current patch have fewer matches in training; their predictions tend toward 50% until enough data accumulates.
Meta shifts hurt calibration. A patch that flips an S-tier hero to F-tier can degrade accuracy for a few weeks until retraining catches up. We retrain on each major patch.
Pro player IDs are sparse. The Pro model needs games-with-this-player data to give a meaningful prediction; brand-new pros or stand-ins fall back to "Unknown" embeddings.
Not financial advice. This is an analytical / educational tool. Win rates published here are not recommendations to bet, and we do not endorse or facilitate sports betting.

How is this different from Dotabuff / Stratz / D2PT?

Dotabuff / Stratz / D2PT are excellent stats sites: they show you what happened. Their hero/winrate tables are historical aggregates.
This site is a model: it predicts the outcome of a draft you give it, with calibrated probabilities and a Lookahead Decision Tree on the BP simulator that no other public Dota tool offers.
Three trained-from-scratch models (Standard / Plus / Pro), each retrained per patch, with the eval history and methodology fully public.