Today's Bets
Today's Predictions
Performance
Equity Curve
Recent Bets
What the model predicts
For every starting pitcher on a slate, the model outputs a single number: the expected number of strikeouts they'll record in their start. From that one number we derive everything else — the probability the line hits over or under, the edge versus what the book is offering, and whether to bet.
The model is a gradient-boosted tree (XGBoost regressor) trained on 15,802 pitcher-game samples spanning 2023–2026 Statcast (refreshed nightly). Cross-validated MAE is 1.85 K. More importantly, holdout MAE on the most recent graded games is 0.56 K — the new model (deployed 2026-05-10) predicts the May 2026 K environment with much lower error than the previous one which had been trained before the season-on regime shift.
The CV number measures generalization across the full multi-year training set. The holdout number measures predictive accuracy on the actual production outcomes you're betting on. The latter is what determines your ROI.
The 56 features it looks at
Recency-weighted stats are the heart of the model — last-3, last-5, last-10 starts plus full-season aggregates. On top of that, opponent and game-context features.
26 distinct concepts, each often computed at multiple recency windows = 56 input variables.
From a K prediction to a bet probability
A predicted strikeout count alone isn't enough — we need to know P(K > line). Strikeouts per game don't follow a normal bell curve; they're over-dispersed count data, so the right distribution is a Negative Binomial.
For each pitcher we treat the prediction as the mean of a NegBinom with overdispersion ratio 1.4 (variance / mean = 1.4, fit on the 15k training games). For a line like 5.5 K, we integrate the tail to get the over probability; one minus that is the under probability.
The same math runs in your dashboard's prediction table, the bet generator, and the grading logic — single source of truth.
Calibration
The probability column on the Dashboard tab is the model's calibrated hit probability, not its raw output. After the 2026-05-10 retrain, the calibration system was simplified — only one active layer, plus a placeholder for a second.
Layer 1 — Bias correction on the K prediction. (currently zeroed) Before the retrain, the old XGBoost model systematically over-predicted high-K starters and under-predicted others, so each predicted-K bucket needed a small shift to align with reality. The retrained model has near-zero systematic bias on the holdout (signed err = −0.008 K), so all bucket corrections are currently 0.0. The piecewise structure stays in place as a hook for future refits — if the model drifts, we'll repopulate the buckets.
Layer 2 — Platt scaling on the bet probability. The raw NegBinom probabilities the model emits aren't quite calibrated to realized hit rates — when the model says "75%", reality might be 70%. Platt scaling is a logistic-regression layer that maps raw → calibrated. Side-specific: OVER and UNDER need different corrections.
| UNDER mapping | raw 0.65 → cal 0.66 · raw 0.75 → cal 0.79 · raw 0.85 → cal 0.91 |
| OVER mapping | raw 0.65 → cal 0.69 · raw 0.78 → cal 0.84 · raw 0.85 → cal 0.91 |
| Brier score | 0.125 raw → 0.061 calibrated on UNDER (skill +75%); 0.094 → 0.043 on OVER (skill +83%) |
Refit 2026-05-10 on 99 graded bets after the model retrain. Constants: PLATT_UNDER = (2.97, −0.47), PLATT_OVER = (2.90, +0.25). Sample size on OVER side is small (n=24); plan to refit Platt again after another ~50 graded predictions accumulate.
How a bet gets selected
For every (pitcher × line × side × book) combination the model checks seven gates in order. The bet only enters the candidate pool if all seven pass:
- Direction — prediction must point the right way (pred > line for OVER, pred < line for UNDER).
- Margin — prediction must be at least 0.5 K off the line.
- Low-K UNDER ban — no UNDER if predicted < 4.0 K (model unreliable).
- High-K OVER ban — no OVER if predicted > 9.0 K (model overshoots).
- Calibration gate — Platt-calibrated model probability must clear
0.59for UNDER,0.63for OVER. Asymmetric because OVER is the leakier side. (Equivalent to the old raw 0.65/0.78 gates after the 2026-05-07 calibration layer was added.) - OVER cushion — for OVER bets, bias-corrected pred must exceed line by at least 1.5 K.
- Suspect-edge cap — reject any candidate with edge ≥ 20%. Fat edges almost always mean the line moved on news the model doesn't see (injury, scratch, weather).
Candidates that survive are ranked by edge. The top 4 are taken (cap on main-line bets per day).
Bet sizing & exposure
Sizing is tiered by confidence, scaling with bankroll:
| HIGH confidence | 7.5% of bankroll · ~$13 at current roll · grows as you do |
| MEDIUM confidence | 3% of bankroll · ~$5 at current roll · sized small (track record around break-even) |
| LOW confidence | skipped — too few samples to justify the risk |
| Daily exposure cap | 25% of bankroll · safety rail across the day's bets |
| Edge floor | 10% minimum |
| Edge ceiling | 20% (suspect-edge cap; treats fat edges as line-move risk) |
No martingale, no scaling-up after losses. Bets compound as bankroll grows because sizing is bankroll-fraction, not flat-dollar. At $400 bankroll a HIGH bet becomes ~$30; at $1000 it becomes ~$75.
Parlay promotion (added 2026-05-10)
Short-odds singles have terrible unit economics — a $13 bet at −335 returns just $3.88 on a hit. When the slate has multiple short-odds high-confidence bets, the pipeline now combines them into 2-leg parlays automatically:
| Trigger | any bet at odds ≤ −200 with calibrated probability ≥ 70% |
| Pairing | highest-edge eligible legs paired greedily; max 2 parlays per day |
| Joint probability floor | parlay must have ≥ 55% combined probability and positive EV to ship |
| Wager | HIGH-tier sizing per parlay (single bankroll-fraction unit) |
| Fate of legs | parlayed legs are not placed as singles — they live only inside the parlay |
Concrete example: two singles at −335 and −455 wager $13 each ($26 total) for a max combined return of $6.74. As a parlay, $13 wagers for ~$7.59 if both hit. Same upside, half the capital exposure, marginally better EV. Parlays land in data/parlay_log.jsonl and surface on the Parlays panel.
Alt-line preference for UNDERs
Post-reset data showed a strong pattern: high-line UNDERs (6.5+ K) outperform low-line UNDERs by a wide margin (UNDER 6.5 = +23.5% ROI vs UNDER 5.5 = +5.6%). Books overprice high-line UNDERs more reliably. The pipeline scans the FanDuel/DraftKings alt-line snapshot for UNDERs at 5.5+ K and prefers them when available.
The model's self-improvement loop (added 2026-05-10)
A second LaunchAgent fires every night at 3:00 AM local (after all games settle). It runs retrain.py which:
- Pulls fresh Statcast through the previous day (incremental, ~30 sec when cached).
- Rebuilds the training matrix using all 4 seasons of data (15,800+ pitcher-games).
- Applies recency weights — yesterday's games count 3× a 1-year-old game, ~9× a 2-year-old game.
- Trains XGBoost with 5-fold time-series CV.
- Validates against the most recent ~50 graded production predictions.
- Deploys the new model only if it beats the previous holdout MAE.
- Old model backed up to
xgb_k_model.prev.joblibfor one-line rollback.
From here on the model adapts autonomously to whatever K environment the league is in this week. No more "trained April 10" staleness. Manual retrains via python3 retrain.py are still possible — same script.
What's NOT being bet right now
- Alt-overs — the 5.5 K alt-over feature was -25% ROI on 10 bets and is currently disabled (
ALT_OVER_MAX_BETS=0). Re-enable after OVER calibration is audited. - Plus-odds OVERs — currently allowed but flagged in the OVER audit. Tightening planned.
Performance to date (since 2026-04-02 reset)
All numbers are straight bets only — the same set tracked on the Dashboard tab. Parlay and flex-play P&L is excluded from the headline. Refreshed on this page render.
| Lifetime | 74–54 (n=128) · +$72.33 · +8.8% ROI |
| UNDER bets · the bread & butter | 58–36 (n=94) · +$84.40 · +13.6% ROI |
| OVER bets · in calibration audit | 16–18 (n=34) · −$12.07 · −6.2% ROI |
| UNDER 6.5+ K · high-line bucket | 29–13 (n=42) · +$65.10 · +23.5% ROI |
| UNDER 5.5–6 · the main line | 23–19 (n=42) · +$15.79 · +5.6% ROI |
| UNDER < 5.5 · low-line | 6–4 (n=10) · +$3.51 · +5.6% ROI |
| Edge ≥ 15% | 43–32 (n=75) · +14.4% ROI |
| Edge 10–15% | 23–16 (n=39) · +5.1% ROI |
| Edge 7–10% · the danger zone | 5–5 (n=10) · −28.5% ROI |
| HIGH confidence | 49–33 (n=82) · +17.4% ROI · carries the lifetime profit |
| MEDIUM confidence | 20–19 (n=39) · −9.0% ROI · being phased toward smaller stakes |
The pattern: high-line UNDERs at HIGH confidence are where the edge lives. The May 2026 regime shift caused a drawdown from a peak of $208 (4/27) down to $172 (5/10) — diagnosed as systematic under-prediction (model trained on April-style K rates, May had higher K rates). Fixed via the 2026-05-10 retrain on fresh data + recency weighting.
Project timeline & calibration history
- 2026-03-14 — first predictions written (spring-training mode, manual line entry).
- 2026-03-27 — Next.js dashboard scaffolded; first live bets attempted (3 OVERs, all losses — first sign that OVER side was leaky).
- 2026-04-02 — bankroll reset for regular season; all subsequent ROI is measured from this date.
- 2026-04-05 — first calibration work:
tune_overdispersion.pyfit the NegBinom dispersion ratio,analyze_predicted_k_bias.pymeasured per-bucket residual. - 2026-04-10 — XGBoost retrained on full 2023-2026 Statcast (15,486 pitcher-games). Current production model. CV MAE 1.82.
- 2026-04-19 — replaced flat 0.35 K bias correction with piecewise per-bucket version. Strategy Diagnostic doc.
- 2026-04-20 — added asymmetric calibration gate (0.72 OVER vs 0.70 UNDER) and 1.5-K OVER cushion rule. Bet Review doc.
- 2026-04-25 — System Audit consolidated overlapping pipelines;
config.pybecame single source of truth for calibration constants (was scattered across 8 files). - 2026-04-29 — UNDER gate down to 0.65 (was 0.70), OVER gate up to 0.78, suspect-edge cap added (≥ 20% rejected), alt-overs disabled, high-line alt-UNDER preference added. OVER Audit doc.
- 2026-05-02 — pipeline migrated to Homebrew Python 3.12 after numpy 2.x compatibility broke 3.9-only install.
- 2026-05-04 — bias buckets refit on 378 graded games (vs 221 in the 4/19 fit). Drove residual bias from −0.17 K to +0.01 K. Multi-run schedule deployed (10 AM, 2 PM, 5 PM local).
- 2026-05-07 — tier-based bet sizing: HIGH 7.5%, MEDIUM 3% of bankroll (was flat $8 across tiers). LOW skipped. Edge floor raised to 10% (from 9%). Daily exposure cap raised 20% → 25% so the upsized HIGH bets actually fit. Removed flat unit cap so bets compound with bankroll. Backtest on the 119-bet sample said this would have grown $100 → ~$284 vs actual $196.
- 2026-05-07 — Platt-scaling probability calibration deployed (Layer 2). Side-specific fit on 113 graded bets. Gate values dropped 0.65/0.78 → 0.59/0.63 to operate on the calibrated scale. Brier score 0.241 → 0.232.
- 2026-05-09 — diagnosed May regime shift: model trained on April-style K rates was systematically under-predicting May K counts by 1.35 K on average. Drawdown from $208 peak to $172 traced directly to UNDER bets failing because the actual K counts were running higher than the (April-tuned) predictions.
- 2026-05-10 — retrained XGBoost on fresh 2026 regular-season data with recency weighting (recent games count 3× a 1-year-old game). Holdout MAE crashed from 2.17 to 0.56, bias from −1.35 K to ~zero. Restored 11 previously-zeroed features (career_k_per_9, is_home, pitcher_k_rate_vs_opp, etc.) — the new model uses 49 of 56 features as real signals. Bias-correction layer zeroed out (no longer needed). Platt scaling refit on the new model's outputs (Brier 0.061 UNDER / 0.043 OVER).
- 2026-05-10 — nightly auto-retrain LaunchAgent deployed (3:00 AM local) — model now adapts autonomously to seasonal regime shifts; the "trained April 10" staleness problem can't recur.
- 2026-05-10 — parlay promotion logic added to daily pipeline. Short-odds high-confidence bets (odds ≤ −200, prob ≥ 70%) now combine into 2-leg parlays automatically — better unit economics than singles where the payouts are tiny.
Roughly 8 weeks of iteration since spring-training pilot, 5+ weeks since the regular-season reset.
How the model stays current during the day
The pipeline runs three times daily on Jake's Mac via the LaunchAgent — at 10 AM, 2 PM, and 5 PM local. Each run pulls the latest probable pitchers from MLB, refreshes odds from FanDuel/DraftKings/etc., re-grades any games that have finished, and adds new bets if any new edges cleared the gates. So a bullpen-day or opener announcement that lands at 11 AM gets picked up at the 2 PM run; a 4 PM scratch is caught at 5 PM.
Inside each run, the model already sees announced openers because their avg_innings_pitched feature is low — the prediction reflects the short outing automatically. The multi-run schedule just makes sure we re-pull when MLB updates the listed pitcher.
The narrow remaining blind spots
Things the model still can't react to in real time: in-game pitcher hot streaks (pulled early or late), weather changes between the 5 PM run and first pitch, and any line move that happens on news the books have but we don't. The suspect-edge cap (≥ 20%) exists for that last case — when a line moves much further than expected, it's almost always news the model is blind to, so we drop the bet rather than chase it.