LLMs are still bad at math:

For the experiment, the authors β€” representing a diversity of mathematical fields β€” each contributed one test question that arose from research they had in the works but had not yet published. They also determined the answers; these solutions are encrypted online and will be released on Feb. 13.

β€œThe goal here is to understand the limits β€” how far can A.I. go beyond its training data and the existing solutions it finds online?” said Dr. Kolda, who is one of few mathematicians to be elected a member of the National Academy of Engineering.

The team conducted preliminary tests on OpenAI’s ChatGPT-5.2 Pro and Google’s Gemini 3.0 Deep Think. When given one shot to produce the answer, the authors wrote, β€œthe best publicly available A.I. systems struggle to answer many of our problems.”