Leaderboard
Switch view: Combined pools all votes, Text/Image filter to that modality. BT Elo = Bradley-Terry MLE (refreshed every 30 min, with 95% bootstrap CI). K=4 / K=8 = live online Elo at two K-factor settings. Click headers to sort.
| # | Model | Released | BT Elo ▼ | online K=4 | online K=8 | games | code-runs |
|---|---|---|---|---|---|---|---|
| 1 | GPT-5.5 gpt-5.5 | 2026-04 | 1173 ±35 | 1146.1 | 1176.4 | 363 | 91% |
| 2 | Gemini 3.1 Pro gemini-3.1-pro-preview | 2026-02 | 1144 ±32 | 1126.3 | 1147.4 | 413 | 100% |
| 3 | GPT-5.4 gpt-5.4 | 2026-03 | 1083 ±32 | 1058.7 | 1059.9 | 341 | 87% |
| 4 | Gemini 3 Flash gemini-3-flash-preview | 2025-12 | 1051 ±31 | 1038.2 | 1031.8 | 426 | 99% |
| 5 | Claude Sonnet 4.6 claude-sonnet-4-6 | 2026-02 | 1033 ±29 | 1016.0 | 1019.0 | 371 | 99% |
| 6 | Claude Opus 4.7 claude-opus-4-7 | 2026-04 | 1004 ±29 | 1013.3 | 1030.9 | 401 | 100% |
| 7 | GPT-5.4 Mini gpt-5.4-mini | 2026-03 | 976 ±31 | 975.8 | 968.9 | 372 | 91% |
| 8 | Gemma 4 31B gemma-4-31b-it | 2026-04 | 970 ±34 | 972.2 | 968.7 | 340 | 94% |
| 9 | Gemini 3.1 Flash Lite gemini-3.1-flash-lite-preview | 2026-03 | 881 ±34 | 901.6 | 882.0 | 337 | 100% |
| 10 | Gemma 4 26B A4B gemma-4-26b-a4b-it | 2026-04 | 873 ±32 | 904.3 | 896.8 | 354 | 88% |
| 11 | Claude Haiku 4.5 claude-haiku-4-5 | 2025-10 | 813 ±32 | 847.6 | 817.9 | 348 | 94% |