3D Code Arena

Leaderboard

Switch view: Combined pools all votes, Text/Image filter to that modality. BT Elo = Bradley-Terry MLE (refreshed every 30 min, with 95% bootstrap CI). K=4 / K=8 = live online Elo at two K-factor settings. Click headers to sort.

#ModelReleasedBT Eloonline K=4online K=8gamescode-runs
1
GPT-5.5
gpt-5.5
2026-04
1173
±35
1146.11176.436391%
2
Gemini 3.1 Pro
gemini-3.1-pro-preview
2026-02
1144
±32
1126.31147.4413100%
3
GPT-5.4
gpt-5.4
2026-03
1083
±32
1058.71059.934187%
4
Gemini 3 Flash
gemini-3-flash-preview
2025-12
1051
±31
1038.21031.842699%
5
Claude Sonnet 4.6
claude-sonnet-4-6
2026-02
1033
±29
1016.01019.037199%
6
Claude Opus 4.7
claude-opus-4-7
2026-04
1004
±29
1013.31030.9401100%
7
GPT-5.4 Mini
gpt-5.4-mini
2026-03
976
±31
975.8968.937291%
8
Gemma 4 31B
gemma-4-31b-it
2026-04
970
±34
972.2968.734094%
9
Gemini 3.1 Flash Lite
gemini-3.1-flash-lite-preview
2026-03
881
±34
901.6882.0337100%
10
Gemma 4 26B A4B
gemma-4-26b-a4b-it
2026-04
873
±32
904.3896.835488%
11
Claude Haiku 4.5
claude-haiku-4-5
2025-10
813
±32
847.6817.934894%