Today's Bets
Today's Predictions
Performance
Equity Curve
Recent Bets
What the model predicts
For every starting pitcher on a slate, the model outputs a single number: the expected number of strikeouts they'll record in their start. From that one number we derive everything else — the probability the line hits over or under, the edge versus what the book is offering, and whether to bet.
The model is a gradient-boosted tree (XGBoost regressor) trained on 15,486 pitcher-game samples spanning 2023–2026 Statcast data. Cross-validated MAE is 1.82 K (i.e. on average the prediction is within ~2 strikeouts of the actual K count). RMSE is 2.28.
The 56 features it looks at
Recency-weighted stats are the heart of the model — last-3, last-5, last-10 starts plus full-season aggregates. On top of that, opponent and game-context features.
26 distinct concepts, each often computed at multiple recency windows = 56 input variables.
From a K prediction to a bet probability
A predicted strikeout count alone isn't enough — we need to know P(K > line). Strikeouts per game don't follow a normal bell curve; they're over-dispersed count data, so the right distribution is a Negative Binomial.
For each pitcher we treat the prediction as the mean of a NegBinom with overdispersion ratio 1.4 (variance / mean = 1.4, fit on the 15k training games). For a line like 5.5 K, we integrate the tail to get the over probability; one minus that is the under probability.
The same math runs in your dashboard's prediction table, the bet generator, and the grading logic — single source of truth.
Bias correction (calibration)
Raw XGBoost predictions are biased systematically by predicted-K bucket — the model overshoots on high-K starters more than on low-K ones. Rather than apply a flat shift, we use a piecewise per-bucket correction measured on graded predictions. Latest refit 2026-05-04 on 378 graded games:
| Predicted < 4 K | subtract 0.16 |
| 4 ≤ pred < 5 | add 0.33 (under-prediction) |
| 5 ≤ pred < 6 | subtract 0.54 |
| 6 ≤ pred < 7 | subtract 0.01 (basically already calibrated) |
| 7+ K | subtract 1.62 (still the largest overshoot) |
Refit 2026-05-04. Drove overall residual bias from −0.17 K to +0.01 K. Realized post-4/19 MAE is 1.74 vs the 1.82 training-time CV baseline.
How a bet gets selected
For every (pitcher × line × side × book) combination the model checks seven gates in order. The bet only enters the candidate pool if all seven pass:
- Direction — prediction must point the right way (pred > line for OVER, pred < line for UNDER).
- Margin — prediction must be at least 0.5 K off the line.
- Low-K UNDER ban — no UNDER if predicted < 4.0 K (model unreliable).
- High-K OVER ban — no OVER if predicted > 9.0 K (model overshoots).
- Calibration gate — bias-corrected model probability must clear
0.65for UNDER,0.78for OVER. Asymmetric because OVER is the leakier side. - OVER cushion — for OVER bets, bias-corrected pred must exceed line by at least 1.5 K.
- Suspect-edge cap — reject any candidate with edge ≥ 20%. Fat edges almost always mean the line moved on news the model doesn't see (injury, scratch, weather).
Candidates that survive are ranked by edge. The top 4 are taken (cap on main-line bets per day).
Bet sizing & exposure
Sizing is intentionally conservative. Each bet is sized at $8 base unit (or 5% of bankroll, whichever is smaller). Total daily exposure is capped at 20% of bankroll — even a five-bet day can't put more than 1/5th of the roll at risk. No martingale, no scaling-up after losses.
The 20% cap is enforced after bet selection; if total wager would exceed it, lower-edge bets get dropped first.
Alt-line preference for UNDERs
Post-reset data showed a strong pattern: high-line UNDERs (6.5+ K) outperform low-line UNDERs by a wide margin (UNDER 6.5 = +47% ROI vs UNDER 4.5 = +6%). Books overprice high-line UNDERs more reliably. The pipeline now scans the FanDuel/DraftKings alt-line snapshot for UNDERs at 5.5+ K and prefers them when available.
What's NOT being bet right now
- Alt-overs — the 5.5 K alt-over feature was -25% ROI on 10 bets and is currently disabled (
ALT_OVER_MAX_BETS=0). Re-enable after OVER calibration is audited. - Plus-odds OVERs — currently allowed but flagged in the OVER audit (-$24 net since reset). Tightening planned.
- Parlays — generated and tracked separately, but the live model is over-confident on combinations (45% hit rate vs 58% predicted). Hit-rate panel only; no dollar P&L feeds into the headline ROI.
Performance to date (since 2026-04-02 reset)
All numbers are straight bets only — the same set tracked on the Dashboard tab. Parlay and flex-play P&L is excluded from the headline. Snapshot updated when the page renders; reflects the live Supabase state.
| Lifetime | 65–45 · +$84.89 · +12.3% ROI |
| UNDER bets · the bread & butter | 50–27 (n=77) · +$101.22 · +20.2% ROI |
| OVER bets · in calibration audit | 15–18 (n=33) · −$16.33 · −8.7% ROI |
| UNDER 6.5+ K · high-line bucket | 25–8 (n=33) · +$85.02 · +40.1% ROI |
| UNDER 5.5–6 · the main line | 19–15 (n=34) · +$12.69 · +5.6% ROI |
| UNDER < 5.5 · low-line | 6–4 (n=10) · +$3.51 · +5.6% ROI |
| Edge ≥ 15% | 41–30 (n=71) · +15.6% ROI |
| Edge 10–15% | 16–10 (n=26) · +9.6% ROI |
| Edge 7–10% · the danger zone | 5–4 (n=9) · −14.7% ROI |
| Edge ≥ 20% | excluded · suspect-edge cap kicks in |
The pattern: high-line UNDERs are where almost all of the model's edge lives. Low-edge bets (7–10%) are net losers. The OVER side is in audit — it's bleeding ~$16 since reset and won't be re-enabled until cushion + plus-odds rules tighten.
Project timeline & calibration history
- 2026-03-14 — first predictions written (spring-training mode, manual line entry).
- 2026-03-27 — Next.js dashboard scaffolded; first live bets attempted (3 OVERs, all losses — first sign that OVER side was leaky).
- 2026-04-02 — bankroll reset for regular season; all subsequent ROI is measured from this date.
- 2026-04-05 — first calibration work:
tune_overdispersion.pyfit the NegBinom dispersion ratio,analyze_predicted_k_bias.pymeasured per-bucket residual. - 2026-04-10 — XGBoost retrained on full 2023-2026 Statcast (15,486 pitcher-games). Current production model. CV MAE 1.82.
- 2026-04-19 — replaced flat 0.35 K bias correction with piecewise per-bucket version. Strategy Diagnostic doc.
- 2026-04-20 — added asymmetric calibration gate (0.72 OVER vs 0.70 UNDER) and 1.5-K OVER cushion rule. Bet Review doc.
- 2026-04-25 — System Audit consolidated overlapping pipelines;
config.pybecame single source of truth for calibration constants (was scattered across 8 files). - 2026-04-29 — UNDER gate down to 0.65 (was 0.70), OVER gate up to 0.78, suspect-edge cap added (≥ 20% rejected), alt-overs disabled, high-line alt-UNDER preference added. OVER Audit doc.
- 2026-05-02 — pipeline migrated to Homebrew Python 3.12 after numpy 2.x compatibility broke 3.9-only install.
- 2026-05-04 — bias buckets refit on 378 graded games (vs 221 in the 4/19 fit). Drove residual bias from −0.17 K to +0.01 K. Multi-run schedule deployed (10 AM, 2 PM, 5 PM local).
Roughly 7 weeks of iteration since spring-training pilot, 5 weeks since the regular-season reset.
How the model stays current during the day
The pipeline runs three times daily on Jake's Mac via the LaunchAgent — at 10 AM, 2 PM, and 5 PM local. Each run pulls the latest probable pitchers from MLB, refreshes odds from FanDuel/DraftKings/etc., re-grades any games that have finished, and adds new bets if any new edges cleared the gates. So a bullpen-day or opener announcement that lands at 11 AM gets picked up at the 2 PM run; a 4 PM scratch is caught at 5 PM.
Inside each run, the model already sees announced openers because their avg_innings_pitched feature is low — the prediction reflects the short outing automatically. The multi-run schedule just makes sure we re-pull when MLB updates the listed pitcher.
The narrow remaining blind spots
Things the model still can't react to in real time: in-game pitcher hot streaks (pulled early or late), weather changes between the 5 PM run and first pitch, and any line move that happens on news the books have but we don't. The suspect-edge cap (≥ 20%) exists for that last case — when a line moves much further than expected, it's almost always news the model is blind to, so we drop the bet rather than chase it.