Mathematician Daniel Litt’s 2025 bet, positing a mere 25% chance of AI matching top human mathematicians by 2030, now seems overly conservative. He recently conceded on his blog, “I now expect to lose this bet,” reflecting a significant shift in perception regarding artificial intelligence’s capabilities in his field. This sentiment is echoed across the mathematical community, astonished by the swift advancement in AI’s problem-solving and proof-generation abilities.

Just a couple of years ago, AI tools were largely ineffective even for high school-level mathematics. Today, however, they are demonstrating proficiency in tackling problems that were previously confined to the research pursuits of human mathematicians. Litt, associated with the University of Toronto, notes this progress is unfolding at a pace exceeding many predictions. Jeremy Avigad of Carnegie Mellon University voiced a widespread concern in a recent essay, stating, “We are running out of places to hide.” He added, “We have to face up to the fact that AI will soon be able to prove theorems better than we can.”

This widespread reaction is not attributed to a single breakthrough but rather to the consistent mathematical progress demonstrated by AI systems. In the previous year, entities like OpenAI and Google DeepMind achieved remarkable success in the International Mathematical Olympiad, a prestigious competition for high school students. This feat was particularly striking as many experts had previously deemed such tasks beyond the scope of AI. More recently, in January, mathematicians began leveraging similar AI tools to address long-standing problems posed by the renowned Hungarian mathematician Paul Erdős.

AI Tackles Complex Research and Verification Tasks

In parallel developments, AI has begun to engage with more intricate mathematical challenges. These advancements include solving actual research problems and the automated verification of cutting-edge proofs – tasks that traditionally demand considerable effort from teams of mathematicians.

In February, a project initiated by Nikhil Srivastava at the University of California, Berkeley, and his colleagues, named the First Proof project, aimed to establish a more grounded benchmark for assessing AI’s mathematical aptitude. The initial phase of this project involved ten problems that researchers had encountered in their professional work. These problems spanned diverse mathematical fields and were selected for their typical research-level difficulty – not overly simple, but not exceptionally arduous either, presenting a spectrum of challenges.

Early Successes and the Evolving Role of AI

Following the public release of these problems, a stream of solutions emerged. AI models developed by tech companies, including OpenAI and Google DeepMind, were among those that attempted to solve the First Proof challenges. OpenAI reported successfully answering half of the problems, based on expert feedback. Google DeepMind, on the other hand, achieved a score of 6 out of 10, as confirmed by mathematicians consulted for each specific problem.

“Things have changed so fast,” commented Thang Luong from Google DeepMind. “For us, now AI has really become a serious collaborator, either to produce serious research work or, in the case of First Proof, it can also actually propose a solution by itself.”

Google’s AI mathematics tool, Aletheia, employs a computationally demanding variant of its Gemini AI chatbot. This system is combined with a verification algorithm designed to identify weaknesses in potential solutions. Subsequently, Aletheia can iteratively refine its approach until a satisfactory answer is reached. While Google has not publicly detailed the number of iterations Aletheia required to solve these problems, making a precise assessment of its performance challenging, the mathematical community remains impressed.

Not all solutions were met with universal agreement. For instance, problem 8, situated within a specialized area of geometry, saw only five out of seven consulted experts concur on the correctness of the AI’s proposed solution. Ivan Smith at the University of Cambridge, who was not part of Google’s effort, observed that the AI’s approach to this problem appeared logical and demonstrated constructive progress. He likened it to the encouraging output of a PhD student, stating, “If this was a PhD student coming back with their thoughts, it would be encouraging and would build confidence that the result was actually true.”

The Challenge of Verifying AI-Generated Proofs

This situation underscores a significant challenge: the difficulty in verifying AI-generated proofs. It is conceivable that AI could produce proofs at a rate exceeding human verification capabilities. The question then arises: if an AI proves a theorem, but lacks human validation, can it be considered proven?

AI is also contributing to solving this verification bottleneck. The technology is rapidly improving in its ability to translate human-readable proofs, presented in natural language, into a format suitable for computer verification. This process is known as formalization.

Automated Formalization of Complex Proofs

The AI company Math, Inc. recently surprised mathematicians by announcing that its tool, Gauss, had successfully formalized and verified an award-winning proof. This proof pertained to the problem of how many spheres can be efficiently packed into a given space. It was the subject of Maryna Viazovska’s 2022 Fields Medal work, an award often equated to the Nobel Prize in mathematics.

The formalization effort for Viazovska’s work commenced in late 2024 with a small, independent group of mathematicians who aimed to manually translate the problem into computer code. They initially focused on Viazovska’s solution for sphere packing in eight dimensions. While they were making steady progress, Math, Inc., which later provided assistance to this research team, announced it had already produced a complete proof. Subsequently, they presented a generalized version of the result applicable to 24 dimensions.

Bhavik Mehta at Imperial College London and his colleagues had developed an initial framework for formalizing Viazovska’s work, including essential mathematical definitions. Mehta emphasized that without these foundational elements, the AI would not have been able to complete its proofs. Chris Birkbeck from the University of East Anglia, also part of the team, described their contribution metaphorically: “We had made all the pieces, but we hadn’t written the instruction manual that explains how to put them together.”

A New Era of Mathematical Practice

The complete proof generated by Gauss comprised approximately 200,000 lines of code. This substantial volume represents about 10% of all existing formalized mathematics. Johan Commelin at Utrecht University in the Netherlands noted that while this code is likely ten times longer than what a human mathematician would produce for the same task, it remains a remarkable achievement. “This is a big deal,” he stated. “This is Fields medal-winning work, and it’s being auto-formalised.”

Commelin suggests that similar formalization efforts should now be feasible across many other mathematical fields, potentially revolutionizing the practice of mathematics. He envisions a future where sophisticated tools automatically formalize new research and mathematical papers, concurrently identifying errors. “This will have huge implications for, say, peer review and refereeing work,” he added.

In light of an increasing proportion of mathematical work being performed by AI, some mathematicians, like Avigad, are raising concerns about the potential negative impact on human capacity for mathematical innovation. Anna Marie Bohmann at Vanderbilt University in Tennessee observes that while AI-driven problem-solving might yield concrete proofs, it can also eliminate the crucial “learning opportunity.” She points out, “Struggling to create and formulate new ideas and to solve new problems is one of the main ways in which both students and mathematics professionals consolidate their knowledge.”

Tony Feng, a member of the Aletheia team at Google DeepMind, shares a similar cautious perspective. He admits, “A lot of times I feel like I should be doing my own homework and going through the process of building my own intuition.”

Mehta acknowledges that even the process of formalizing proofs can yield valuable insights. He and his colleagues now face the task of analyzing the 200,000-line AI proof to identify potentially useful components for future projects.

Despite these evolving dynamics, mathematicians maintain optimism about their continued relevance in an increasingly machine-assisted future. Commelin draws a parallel with the historical shift in mathematics, where manual calculations, once a significant aspect of a mathematician’s role, are now automated. “I think similar things will happen here, where we radically change what we’re doing, but 10 or 20 years from now, we will still recognise what we’re doing as mathematics, in a new style.”