In March 2016, Google DeepMind’s artificial intelligence system, AlphaGo, stunned the world. Facing the undisputed champion in the ancient Chinese board game of Go, Lee Sedol, the AI secured a decisive victory in a five-match series. This event, broadcast live to millions, was widely recognized as a pivotal moment in the advancement of artificial intelligence.

Chris Maddison, now a professor of artificial intelligence at the University of Toronto, was then a master’s student who played an integral role in launching the project. The genesis of AlphaGo traces back to an inquiry from Ilya Sutskever, who would later co-found OpenAI.

Alex Wilkins inquired about the initial conception of AlphaGo. Maddison recounted the reasoning provided by Sutskever: “Chris, do you believe an expert player can identify the optimal move on a Go board within half a second?” Sutskever argued that if this were true, then a neural network could effectively learn to replicate this decision-making process. The rationale was grounded in the observation that half a second roughly corresponds to a single processing cycle in the human visual cortex. Evidence from ImageNET, a significant AI image-recognition competition, already indicated proficiency in approximating tasks achievable within this timeframe.

Convinced by this argument, Maddison joined Google Brain as an intern in the summer of 2014.

Developing AlphaGo

Upon joining, Maddison became part of a smaller team at DeepMind, comprising Aja Huang and David Silver, who had already begun exploring Go. His primary responsibility was to spearhead the development of the neural networks. Maddison described this period as “a dream,” despite numerous initial approaches proving unsuccessful.

Frustrated by the lack of progress, Maddison opted for a straightforward strategy: training a neural network on a vast collection of expert games to predict an expert’s likely next move. This seemingly simple approach proved to be the foundational breakthrough for the project.

By the close of that summer, a match was arranged against Thore Graepel, a DeepMind researcher who considered himself a capable Go player. Maddison’s networks emerged victorious. This success signaled to DeepMind the significant potential of the endeavor, prompting increased resource allocation and the formation of a dedicated, larger team.

The Challenge of Lee Sedol

Maddison recalled the prevailing sentiment regarding the challenge of defeating Lee Sedol: “I remember in the summer of 2014, we practically had Lee Sedol’s portrait on our desk next to us.” While not a Go player himself, Maddison collaborated closely with Aja Huang. With each iteration of network development, as Maddison assessed incremental improvements, he would ask Huang about their proximity to Lee Sedol’s skill level. Huang’s response underscored the immense gap, emphasizing, “Chris, you don’t understand. Lee Sedol is one stone from God.”

Departure from the AlphaGo Team

David Silver expressed a desire for Maddison to continue his involvement and guide the project to its next stage. Reflecting on this, Maddison admitted it was “maybe one of the stupider decisions I made” to decline. He felt compelled to prioritize his PhD, identifying himself as “an academic at heart.” He returned to his doctoral research, offering only informal consultations thereafter. Maddison noted with a degree of pride that it took considerable time for the team to surpass his neural networks. Ultimately, the AI that competed against Lee Sedol was the culmination of extensive engineering efforts and a large, collective team.

The Atmosphere in Seoul

Maddison described the experience of being in Seoul during AlphaGo’s victory as profoundly moving and intense, marked by a palpable sense of anxiety. He explained, “You go in confident, but you never know. It’s like a sports game.” Despite statistical advantages, unpredictable outcomes are always possible.

He recounted looking out the hotel window towards a major city intersection in Seoul, where a large screen, akin to Times Square, was broadcasting the match. Observing the crowds lining the sidewalks, engrossed in the game, provided a stark realization of the event’s impact. Having heard reports of hundreds of millions in China watching the initial game, that specific moment in Seoul felt as though “we’ve really stopped East Asia in its tracks.”

AlphaGo’s Broader Significance for AI

The landscape of artificial intelligence has undergone significant transformations, particularly with the rise of large language models (LLMs), which differ considerably from AlphaGo in certain aspects. However, Maddison highlighted an enduring, underlying technological thread connecting them.

The initial phase of AlphaGo’s algorithm involved training a neural network to predict the subsequent move. Contemporary LLMs begin with a “pretraining” stage, focused on predicting the next word from an extensive corpus of internet-based human text.

For the second stage of AlphaGo, data from the human corpus was compressed into neural networks and further refined through reinforcement learning. This process aimed to align the system’s behavior with the objective of winning games. While predicting an expert’s move implicitly involves their intent to win, other factors can influence their choices, such as misjudgments or a lack of complete understanding. Therefore, aligning the system with the ultimate goal—winning in AlphaGo’s case—was crucial.

Similarly, after the pretraining phase of LLMs, the networks require alignment with desired user outcomes. This is achieved through a series of reinforcement learning steps that steer the networks toward specific goals.

In essence, the fundamental principles have remained remarkably consistent.

AI’s Trajectory and Success Factors

Maddison suggested that the focus of AI development has considerable implications. When aiming for progress on significant challenges, the primary bottlenecks are data availability for pretraining and the presence of suitable reward signals for post-training. Without these fundamental components, algorithmic sophistication alone is insufficient to achieve breakthroughs.

Sympathy for Lee Sedol

Reflecting on Lee Sedol, Maddison described him as an “idol” and an “unachievable milestone” during the summer of 2014. Witnessing him in person during the matches, observing his stress and anxiety, and his evident realization of AlphaGo’s formidable strength, created a “very stressful” environment. Maddison expressed a desire not to place anyone in such a difficult position.

Lee Sedol’s post-match apology to humanity, stating, “This is my failing, not yours,” was perceived as tragic.

A customary practice in Go involves reviewing the match with one’s opponent after its conclusion, exploring variations together. This ritual was impossible for Lee Sedol with AlphaGo, as it’s not human. Instead, he enlisted friends for the review, a substitute that fell short of the true experience. Maddison found this aspect “heartbreaking.”

He expressed his discomfort with the pervasive “man-versus-machine” narrative surrounding the match. Maddison emphasized that AlphaGo was the product of a collaborative human effort, a “tribe building an artifact that could achieve excellence in a human game.” It represented the culmination of their collective dedication.

The Enduring Role of Humans in an AI-Dominated World

Maddison proposed that as our understanding of Go deepens and AI contributes to appreciating its beauty, there is inherent value in this process. He distinguished between goals and purposes, noting that while Go’s goal is to win, its purpose extends to enjoyment. The proliferation of AI has not diminished the appeal of board games; chess, for instance, remains a vibrant industry, with humans still valuing the strategic depth and accomplishment inherent in human play.