3DCodeBench

Leaderboard

Switch view: Combined pools all votes, Text/Image filter to that modality. BT Elo = Bradley-Terry MLE (refreshed every 30 min, with 95% bootstrap CI). K=4 / K=8 = live online Elo at two K-factor settings. Click headers to sort.

#ModelReleasedBT Eloonline K=4online K=8gamescode-runs
1
Fable 5
claude-fable-5
2026-06
1238
±76
1090.71149.688
2
GPT-5.5
gpt-5.5
2026-04
1135
±27
1135.71145.5647100%
3
Gemini 3.1 Pro
gemini-3.1-pro-preview
2026-02
1116
±24
1118.51116.8755100%
4
Gemini 3.5 Flash
gemini-3.5-flash
2026-05
1114
±27
1126.01127.6625100%
5
Claude Opus 4.8
claude-opus-4-8
2026-05
1079
±84
1015.51028.662
6
GPT-5.4
gpt-5.4
2026-03
1042
±26
1047.71044.8565100%
7
Gemini 3 Flash
gemini-3-flash-preview
2025-12
1012
±26
1034.11042.1631100%
8
Claude Sonnet 4.6
claude-sonnet-4-6
2026-02
984
±27
986.3976.2597100%
9
Claude Opus 4.7
claude-opus-4-7
2026-04
977
±25
980.0965.7653100%
10
Gemma 4 31B
gemma-4-31b-it
2026-04
922
±30
940.6933.1518
11
GPT-5.4 Mini
gpt-5.4-mini
2026-03
920
±27
932.0929.1588100%
12
Gemini 3.1 Flash Lite
gemini-3.1-flash-lite-preview
2026-03
850
±31
905.5906.9427100%
13
Gemma 4 26B A4B
gemma-4-26b-a4b-it
2026-04
829
±34
873.6854.3451
14
Claude Haiku 4.5
claude-haiku-4-5
2025-10
772
±32
827.0804.0445100%