Leaderboard
Switch view: Combined pools all votes, Text/Image filter to that modality. BT Elo = Bradley-Terry MLE (refreshed every 30 min, with 95% bootstrap CI). K=4 / K=8 = live online Elo at two K-factor settings. Click headers to sort.
| # | Model | Released | BT Elo ▼ | online K=4 | online K=8 | games | code-runs |
|---|---|---|---|---|---|---|---|
| 1 | Fable 5 claude-fable-5 | 2026-06 | 1238 ±76 | 1090.7 | 1149.6 | 88 | – |
| 2 | GPT-5.5 gpt-5.5 | 2026-04 | 1135 ±27 | 1135.7 | 1145.5 | 647 | 100% |
| 3 | Gemini 3.1 Pro gemini-3.1-pro-preview | 2026-02 | 1116 ±24 | 1118.5 | 1116.8 | 755 | 100% |
| 4 | Gemini 3.5 Flash gemini-3.5-flash | 2026-05 | 1114 ±27 | 1126.0 | 1127.6 | 625 | 100% |
| 5 | Claude Opus 4.8 claude-opus-4-8 | 2026-05 | 1079 ±84 | 1015.5 | 1028.6 | 62 | – |
| 6 | GPT-5.4 gpt-5.4 | 2026-03 | 1042 ±26 | 1047.7 | 1044.8 | 565 | 100% |
| 7 | Gemini 3 Flash gemini-3-flash-preview | 2025-12 | 1012 ±26 | 1034.1 | 1042.1 | 631 | 100% |
| 8 | Claude Sonnet 4.6 claude-sonnet-4-6 | 2026-02 | 984 ±27 | 986.3 | 976.2 | 597 | 100% |
| 9 | Claude Opus 4.7 claude-opus-4-7 | 2026-04 | 977 ±25 | 980.0 | 965.7 | 653 | 100% |
| 10 | Gemma 4 31B gemma-4-31b-it | 2026-04 | 922 ±30 | 940.6 | 933.1 | 518 | – |
| 11 | GPT-5.4 Mini gpt-5.4-mini | 2026-03 | 920 ±27 | 932.0 | 929.1 | 588 | 100% |
| 12 | Gemini 3.1 Flash Lite gemini-3.1-flash-lite-preview | 2026-03 | 850 ±31 | 905.5 | 906.9 | 427 | 100% |
| 13 | Gemma 4 26B A4B gemma-4-26b-a4b-it | 2026-04 | 829 ±34 | 873.6 | 854.3 | 451 | – |
| 14 | Claude Haiku 4.5 claude-haiku-4-5 | 2025-10 | 772 ±32 | 827.0 | 804.0 | 445 | 100% |