LLMs are bad at solving math problems not in their training data

LLMs are still bad at math:

For the experiment, the authors — representing a diversity of mathematical fields — each contributed one test question that arose from research they had in the works but had not yet published. They also determined the answers; these solutions are encrypted online and will be released on Feb. 13.

“The goal here is to understand the limits — how far can A.I. go beyond its training data and the existing solutions it finds online?” said Dr. Kolda, who is one of few mathematicians to be elected a member of the National Academy of Engineering.

The team conducted preliminary tests on OpenAI’s ChatGPT-5.2 Pro and Google’s Gemini 3.0 Deep Think. When given one shot to produce the answer, the authors wrote, “the best publicly available A.I. systems struggle to answer many of our problems.”

Open source voice transcription tools

The future of coding

The virtual co-workers are here?

Join the Conversation