new_benchmark_50

A new benchmark of olympiad-level problems we gathered to test our own engine. Problems were selected to be compatible with the interpretation capabilities of the original AlphaGeometry. Problems were obtained from the following sources:

  • IMO exams prior to 2000 and from 2024;

  • IMO shortlists from 2009 to 2022;

  • USA Math Olympiad from 1988 to 2023.

All problems are named either as “YEAR_PROBLEM-NUMBER”, for problems from the IMO exams, “YEAR_sl_PROBLEM-IDENTIFIER”, for problems from the IMO shortlist, or “usamo_YEAR_PROBLME-NUMBER”, for problems from the USA Math Olympiad.

We aimed at having 50 problems, but the only criteria on choosing the problems was the possibility of translating them into the original formal language from AlphaGeometry. With that criterium, the lists are ideally exhaustive in each time range for each olympiad, as long as there is no overlap with the imo_ag_30 benchmark. We have sourced 48 problems, with problems from IMO shortlists G4 from 2018 and G7 from 2020 split into two problems each to account for multiple goals, as demanded by the original AlphaGeometry limitations.

Newclid solved 17/50 problems by itself. They are registered in the table below.

Problem Name

Solved w/ original DDAR?

Solved w/ Newclid?

1983_p2

1995_p1

Yes

2024_p4

Yes

2009_sl_g3

2009_sl_g6

2010_sl_g1

Yes

2010_sl_g1

2010_sl_g2

2011_sl_g6

Yes

2012_sl_g2

Yes

2012_sl_g3

2012_sl_g4

2013_sl_g2

2013_sl_g4

Yes

2014_sl_g3

2015_sl_g1

Yes

2015_sl_g3

2015_sl_g5

2016_sl_g2

2016_sl_g4

2016_sl_g5

2016_sl_g6

2017_sl_g3

2017_sl_g4

2017_sl_g7

Yes

2018_sl_g2

2018_sl_g4a

Yes

2018_sl_g4b

Yes

2018_sl_g5

2018_sl_g7

Yes

2019_sl_g1

Yes

2019_sl_g2

2019_sl_g7

2020_sl_g7a

2020_sl_g7b

2020_sl_g8

2021_sl_g1

Yes

2021_sl_g4

2022_sl_g2

2022_sl_g3

usamo_1988_p4

Yes

usamo_1990_p5

Yes

usamo_1997_p2

usamo_1999_p6

usamo_2001_p2

usamo_2005_p3

usamo_2008_p2

usamo_2012_p5

usamo_2013_p1

usamo_2014_p5

Yes

usamo_2023_p1

Yes