Loading...
Champion--
Avg Conf--
Upsets--
Leverage--
Results--
Updated--
ROUND 1Mar 19-20
ROUND 2Mar 21-22
SWEET 16Mar 26-27
ELITE 8Mar 28-29
FINAL 4Apr 4-6
ELITE 8Mar 28-29
SWEET 16Mar 26-27
ROUND 2Mar 21-22
ROUND 1Mar 19-20
High confidence (70%+) Medium (50-70%) Low (<50%) Upset pick Leverage pick Name = Predicted winner
Loading analytics...
Loading backtest data...
ROUND 1Mar 20-21
ROUND 2Mar 22-23
SWEET 16Mar 27-28
ELITE 8Mar 29-30
FINAL 4Apr 3-5
ELITE 8Mar 29-30
SWEET 16Mar 27-28
ROUND 2Mar 22-23
ROUND 1Mar 20-21
High confidence (70%+) Medium (50-70%) Low (<50%) Upset pick Name = Predicted winner
ABOUT THIS PROJECT
The story behind the AI bracket
ABOUT THIS PROJECT
AI-powered March Madness bracket predictor built on March 18, 2026. Uses 6 AI models — Claude (Anthropic), GPT-4o (OpenAI), Gemini (Google), Grok (xAI), Llama (Meta), and DeepSeek — in the most diverse multi-model ensemble ever applied to bracket prediction.
6 AI MODELS. 6 COMPANIES. 1 ANSWER.
We gave the same bracket to Claude (Anthropic), GPT-4o (OpenAI), Gemini (Google), Grok (xAI), Llama (Meta), and DeepSeek — and they agreed on 80.6% of picks. This matters.
If you're paying for multiple AI services, you should know: they often produce the same answers. Not because they're right, but because they all learned from the same internet. Model diversity ≠ information diversity. When six independently-developed models from six different companies converge on the same picks, it tells us more about their shared training data than about basketball.
HOW IT WORKS
1. DATA COLLECTION
KenPom ratings, BartTorvik T-Rank, NCAA NET rankings, injury reports, beat writer intel
2. MULTI-MODEL ENSEMBLE
Each matchup is analyzed by all 6 AI models independently: Claude, GPT-4o, Gemini, Grok, Llama, and DeepSeek. Their predictions are compared using ensemble consensus rules. Agreement rate: 80.6%.
3. CONFIDENCE FORMULA
Weighted formula: 55% model strength, 20% source agreement, 15% lineup certainty, 10% data freshness
4. CROSS-AGENT DEBATE
The 10 most uncertain picks undergo a structured debate where the AI plays devil's advocate against its own predictions. 2 picks were flipped and average confidence dropped 8.4 points, reducing overconfidence.
5. MONTE CARLO SENSITIVITY
250 simulations testing different weight combinations against 2022-2025 historical results to find optimal calibration.
6. HISTORICAL BACKTESTING
Model accuracy scored against actual tournament results from 2022-2025, spanning ultra-chalk (2025) to extreme-chaos (2023) tournaments.
PREDICTION EVOLUTION
How the champion pick evolved through each stage of the pipeline
Initial run (Claude only)
UConn
After ensemble (Claude + GPT-4o)
Duke
After cross-agent debate
Duke (2 R64 picks flipped)
Final prediction
Duke over Houston predicted total score 152
TECH STACK
Python backend with 6 AI models: Anthropic (Claude), OpenAI (GPT-4o), Google (Gemini), xAI (Grok), Meta (Llama via Groq), and DeepSeek. KenPom and BartTorvik for statistical data. GradientBoosting ML for historical calibration. Static HTML dashboard deployed on Vercel. Source code at github.com/elstonj/march-madness-2026.
THE SELDON PARALLEL
"Psychohistory dealt not with man, but with man-masses. It was the science of mobs; globules of the human race... The reaction of one man could be forecast by no known mathematics; the reaction of a billion is something else again." — Isaac Asimov, Foundation
Hari Seldon used psychohistory to predict the behavior of entire civilizations. We're using 6 AI models, 10 years of data, and Monte Carlo simulations to predict 63 basketball games. The math is the same: aggregate enough independent signals and the noise cancels out, leaving the signal. The question Seldon never answered — and neither can we — is what happens when a single individual (a player having the game of their life) overrides the statistical prediction. That's the 8% we can't capture.
"Never let your sense of morals prevent you from doing what is right." — Salvor Hardin
BY THE NUMBERS
6
AI MODELS USED
From 6 different companies
500+
API CALLS MADE
Across all models
92.1%
ML MODEL ACCURACY
On 10 years of historical data
80.6%
MODEL AGREEMENT
Suggesting convergence, not diversity
23.9%
LINEUP CERTAINTY IMPACT
The #1 prediction driver per Monte Carlo
$22M
KENTUCKY'S NIL SPEND
Most expensive roster, only a 7-seed
THE MONEY QUESTION
Can you buy a championship?
$22M
Kentucky spent $22M on basketball NIL — the most in the country — and is a 7-seed.
#77
Last year's champion Florida ranked 77th in NIL spending.
Can you buy a championship? Mark Cuban bought Indiana football one. Basketball might be different — five players touch the ball, chemistry matters, and the tournament's single-elimination format means one bad night ends everything, no matter the payroll.
THE PREDICTION CEILING
Why can't we predict every game correctly?
Every prediction method hits a wall. Here's where the major approaches land:
83%Chalk
88%KenPom
92%Our ML
~97%Ceiling
The last 5–8% is genuine chaos: a player having the game of their life, a referee's whistle, a lucky bounce. Not even knowing everything at the molecular level would eliminate all variance in human athletic performance. The tournament isn't broken — it's designed to produce uncertainty. That's why they call it Madness.
FINAL SYNOPSIS
Post-tournament retrospective from all 6 AI models
Loading...