Yankees-K Live Dashboard

loading…

Today's Bets

loading…

Today's Predictions

loading…

Performance

loading…

Equity Curve

loading…

Recent Bets

loading…

What the model predicts

For every starting pitcher on a slate, the model outputs a single number: the expected number of strikeouts they'll record in their start. From that one number we derive everything else — the probability the line hits over or under, the edge versus what the book is offering, and whether to bet.

The model is a gradient-boosted tree (XGBoost regressor) trained on 15,486 pitcher-game samples spanning 2023–2026 Statcast data. Cross-validated MAE is 1.82 K (i.e. on average the prediction is within ~2 strikeouts of the actual K count). RMSE is 2.28.

The 56 features it looks at

Recency-weighted stats are the heart of the model — last-3, last-5, last-10 starts plus full-season aggregates. On top of that, opponent and game-context features.

K rates · last3, last5, last10, season

Whiff rate · how often hitters miss

CSW rate · called-strike + whiff

Chase rate · swings out of zone

Zone rate · pitches in the zone

Avg FB velocity · 4-seam mph

Total pitches · workload

FPS rate · first-pitch strike

Spin volatility · trend + variance

Velocity rest adj · velo on extra rest

Days rest · since last start

Career K/9 · long-run rate

Avg innings · expected length

Pitch-count trend · workload curve

Opponent K% · how much they whiff

Opp chase rate · plate discipline

Opp whiff rate

Lineup K score · today's order

Opp bullpen fatigue

Park K factor · stadium effect

Umpire K factor · today's ump

Pitcher-vs-opp history

Game leverage

Vegas line · game total / spread context

Spring velo adjustment

Expected pitches · usage cap

26 distinct concepts, each often computed at multiple recency windows = 56 input variables.

From a K prediction to a bet probability

A predicted strikeout count alone isn't enough — we need to know P(K > line). Strikeouts per game don't follow a normal bell curve; they're over-dispersed count data, so the right distribution is a Negative Binomial.

For each pitcher we treat the prediction as the mean of a NegBinom with overdispersion ratio 1.4 (variance / mean = 1.4, fit on the 15k training games). For a line like 5.5 K, we integrate the tail to get the over probability; one minus that is the under probability.

The same math runs in your dashboard's prediction table, the bet generator, and the grading logic — single source of truth.

Bias correction (calibration)

Raw XGBoost predictions are biased systematically by predicted-K bucket — the model overshoots on high-K starters more than on low-K ones. Rather than apply a flat shift, we use a piecewise per-bucket correction measured on graded predictions. Latest refit 2026-05-04 on 378 graded games:

Predicted < 4 K	subtract 0.16
4 ≤ pred < 5	add 0.33 (under-prediction)
5 ≤ pred < 6	subtract 0.54
6 ≤ pred < 7	subtract 0.01 (basically already calibrated)
7+ K	subtract 1.62 (still the largest overshoot)

Refit 2026-05-04. Drove overall residual bias from −0.17 K to +0.01 K. Realized post-4/19 MAE is 1.74 vs the 1.82 training-time CV baseline.

How a bet gets selected

For every (pitcher × line × side × book) combination the model checks seven gates in order. The bet only enters the candidate pool if all seven pass:

Direction — prediction must point the right way (pred > line for OVER, pred < line for UNDER).
Margin — prediction must be at least 0.5 K off the line.
Low-K UNDER ban — no UNDER if predicted < 4.0 K (model unreliable).
High-K OVER ban — no OVER if predicted > 9.0 K (model overshoots).
Calibration gate — bias-corrected model probability must clear 0.65 for UNDER, 0.78 for OVER. Asymmetric because OVER is the leakier side.
OVER cushion — for OVER bets, bias-corrected pred must exceed line by at least 1.5 K.
Suspect-edge cap — reject any candidate with edge ≥ 20%. Fat edges almost always mean the line moved on news the model doesn't see (injury, scratch, weather).

Candidates that survive are ranked by edge. The top 4 are taken (cap on main-line bets per day).

Bet sizing & exposure

Sizing is intentionally conservative. Each bet is sized at $8 base unit (or 5% of bankroll, whichever is smaller). Total daily exposure is capped at 20% of bankroll — even a five-bet day can't put more than 1/5th of the roll at risk. No martingale, no scaling-up after losses.

The 20% cap is enforced after bet selection; if total wager would exceed it, lower-edge bets get dropped first.

Alt-line preference for UNDERs

Post-reset data showed a strong pattern: high-line UNDERs (6.5+ K) outperform low-line UNDERs by a wide margin (UNDER 6.5 = +47% ROI vs UNDER 4.5 = +6%). Books overprice high-line UNDERs more reliably. The pipeline now scans the FanDuel/DraftKings alt-line snapshot for UNDERs at 5.5+ K and prefers them when available.

What's NOT being bet right now

Alt-overs — the 5.5 K alt-over feature was -25% ROI on 10 bets and is currently disabled (ALT_OVER_MAX_BETS=0). Re-enable after OVER calibration is audited.
Plus-odds OVERs — currently allowed but flagged in the OVER audit (-$24 net since reset). Tightening planned.
Parlays — generated and tracked separately, but the live model is over-confident on combinations (45% hit rate vs 58% predicted). Hit-rate panel only; no dollar P&L feeds into the headline ROI.

Performance to date (since 2026-04-02 reset)

All numbers are straight bets only — the same set tracked on the Dashboard tab. Parlay and flex-play P&L is excluded from the headline. Snapshot updated when the page renders; reflects the live Supabase state.

Lifetime	65–45 · +$84.89 · +12.3% ROI
UNDER bets · the bread & butter	50–27 (n=77) · +$101.22 · +20.2% ROI
OVER bets · in calibration audit	15–18 (n=33) · −$16.33 · −8.7% ROI
UNDER 6.5+ K · high-line bucket	25–8 (n=33) · +$85.02 · +40.1% ROI
UNDER 5.5–6 · the main line	19–15 (n=34) · +$12.69 · +5.6% ROI
UNDER < 5.5 · low-line	6–4 (n=10) · +$3.51 · +5.6% ROI
Edge ≥ 15%	41–30 (n=71) · +15.6% ROI
Edge 10–15%	16–10 (n=26) · +9.6% ROI
Edge 7–10% · the danger zone	5–4 (n=9) · −14.7% ROI
Edge ≥ 20%	excluded · suspect-edge cap kicks in

The pattern: high-line UNDERs are where almost all of the model's edge lives. Low-edge bets (7–10%) are net losers. The OVER side is in audit — it's bleeding ~$16 since reset and won't be re-enabled until cushion + plus-odds rules tighten.

Project timeline & calibration history

2026-03-14 — first predictions written (spring-training mode, manual line entry).
2026-03-27 — Next.js dashboard scaffolded; first live bets attempted (3 OVERs, all losses — first sign that OVER side was leaky).
2026-04-02 — bankroll reset for regular season; all subsequent ROI is measured from this date.
2026-04-05 — first calibration work: tune_overdispersion.py fit the NegBinom dispersion ratio, analyze_predicted_k_bias.py measured per-bucket residual.
2026-04-10 — XGBoost retrained on full 2023-2026 Statcast (15,486 pitcher-games). Current production model. CV MAE 1.82.
2026-04-19 — replaced flat 0.35 K bias correction with piecewise per-bucket version. Strategy Diagnostic doc.
2026-04-20 — added asymmetric calibration gate (0.72 OVER vs 0.70 UNDER) and 1.5-K OVER cushion rule. Bet Review doc.
2026-04-25 — System Audit consolidated overlapping pipelines; config.py became single source of truth for calibration constants (was scattered across 8 files).
2026-04-29 — UNDER gate down to 0.65 (was 0.70), OVER gate up to 0.78, suspect-edge cap added (≥ 20% rejected), alt-overs disabled, high-line alt-UNDER preference added. OVER Audit doc.
2026-05-02 — pipeline migrated to Homebrew Python 3.12 after numpy 2.x compatibility broke 3.9-only install.
2026-05-04 — bias buckets refit on 378 graded games (vs 221 in the 4/19 fit). Drove residual bias from −0.17 K to +0.01 K. Multi-run schedule deployed (10 AM, 2 PM, 5 PM local).

Roughly 7 weeks of iteration since spring-training pilot, 5 weeks since the regular-season reset.

How the model stays current during the day

The pipeline runs three times daily on Jake's Mac via the LaunchAgent — at 10 AM, 2 PM, and 5 PM local. Each run pulls the latest probable pitchers from MLB, refreshes odds from FanDuel/DraftKings/etc., re-grades any games that have finished, and adds new bets if any new edges cleared the gates. So a bullpen-day or opener announcement that lands at 11 AM gets picked up at the 2 PM run; a 4 PM scratch is caught at 5 PM.

Inside each run, the model already sees announced openers because their avg_innings_pitched feature is low — the prediction reflects the short outing automatically. The multi-run schedule just makes sure we re-pull when MLB updates the listed pitcher.

The narrow remaining blind spots

Things the model still can't react to in real time: in-game pitcher hot streaks (pulled early or late), weather changes between the 5 PM run and first pitch, and any line move that happens on news the books have but we don't. The suspect-edge cap (≥ 20%) exists for that last case — when a line moves much further than expected, it's almost always news the model is blind to, so we drop the bet rather than chase it.