This week, we share one of our favourite AI features from the past year: an exploration by our friend Stephen Marche in The New Yorker that delves into the surprising origins and impact of the transformer architecture. Aidan Gomez, now the CEO and co-founder of Cohere, a Radical Ventures portfolio company, was a college intern at Google when he co-authored the groundbreaking paper “Attention Is All You Need” that introduced the transformer. This technology has since become the backbone of modern linguistic AI (and Generative AI more generally), revolutionizing how machines process and generate language.
In the spring of 2017, in a room on the second floor of Google’s Building 1965, a college intern named Aidan Gomez stretched out, exhausted. It was three in the morning, and Gomez and Ashish Vaswani, a scientist focused on natural language processing, were working on their team’s contribution to the Neural Information Processing Systems conference, the biggest annual meeting in the field of artificial intelligence. Along with the rest of their eight-person group at Google, they had been pushing flat out for twelve weeks, sometimes sleeping in the office, on couches by a curtain that had a neuron-like pattern. They were nearing the finish line, but Gomez didn’t have the energy to go out to a bar and celebrate. He couldn’t have even if he’d wanted to: he was only twenty, too young to drink in the United States.
“This is going to be a huge deal,” Vaswani said.
“It’s just machine translation,” Gomez said, referring to the subfield of A.I.-driven translation software, at which their paper was aimed. “Isn’t this just what research is?”
“No, this is bigger,” Vaswani replied.
Today, Gomez, who is now in his late twenties, has become the C.E.O. of Cohere, an artificial intelligence company valued at five and a half billion dollars. The transformer—the “T” in ChatGPT—sits at the core of what may be the most revolutionary technology of the twenty-first century. PricewaterhouseCoopers has estimated that A.I. could add $15.7 trillion dollars to global G.D.P. by the year 2030—a substantial share of it contributed by transformer-based applications. That figure only gestures toward some huge but unknown impact. Other consequences seem even more murkily vast: some tech prophets propose apocalyptic scenarios that could almost be taken right from the movies. What’s mainly certain, right now, is that linguistic A.I. is changing the relationship between human beings and language. In an age of machine-generated text, terms like “writing,” “understanding,” “meaning,” and “thinking” need to be reconsidered.
If transformer-based A.I. were more familiar and complicated—if, say, it involved many components analogous to the systems and subsystems in our own brains—then the richness of its behavior might be less surprising. As it is, however, it generates nonhuman language in a way that challenges our intuitions and vocabularies. If you ask a large language model to write a sentence “silkily and smoothly,” it will produce a silky and smooth piece of writing; it registers what “silkily and smoothly” are, and can define and perform them. A neural network that can write about Japanese punk bands must on some level “understand” that a band can break up and reform under a different name; similarly, it must grasp the nuances of the idea of an Australian sitcom in order to make one up. But this is a different kind of “understanding” from the kind we know.
The researchers behind the transformer have different ways of reckoning with its capabilities. “I think that even talking about ‘understanding’ is something we are not prepared to do,” Vaswani told me. “We have only started to define what it means to understand these models.”
Read the full article here.
AI News This Week
-
Fei-Fei Li says understanding how the world works is the next step for AI (The Economist)
Radical Ventures Scientific Partner Fei-Fei Li argues that spatial intelligence is crucial for AI’s evolution beyond language models. Li describes how the ImageNet database of 15M labeled images created by her team in 2007 helped spark the modern AI boom. As CEO and co-founder of Radical Ventures portfolio company World Labs, Li is spearheading the shift from large language models to “large world models” that can understand and interact in 3D space by using a combination of text, image, video, and robotic sensor data.
-
A revolution in how robots learn (The New Yorker)
Robotics is approaching its “ChatGPT moment” as the field shifts from scripted movements to learned behaviors. Similar to infant development and motor learning, researchers note how robots are now developing more natural, adaptive movements through AI rather than explicit programming. This breakthrough addresses the longstanding “Moravec paradox” where seemingly simple human tasks like grasping objects have traditionally challenged robots. Recent advances suggest robots may soon master everyday tasks once considered “AI-proof,” such as making coffee or folding clothes.
-
AI can now create a replica of your personality (MIT Technology Review)
Researchers have developed AI simulation agents that can accurately replicate human personalities from two-hour conversational interviews. The study, led by Stanford PhD student Joon Sung Park, tested 1,000 diverse participants and achieved 85% accuracy in mimicking human responses across personality tests and social surveys. These simulation agents differ from current tool-based agents by focusing on replicating human behaviour rather than completing tasks.
-
They taught AI to sing, and it was beautiful (The New York Times)
Holly Herndon and Mat Dryhurst are the artists behind a new AI-powered exhibit at London’s Serpentine Gallery. “The Call“ explores ways in which AI can enhance vocal artistry. The exhibition centers on choral AI models trained through partnerships with 15 UK ensembles using specially composed hymns and exercises. Drawing parallels between ancient choral traditions and modern AI systems, the project builds on Serpentine’s decade-long exploration of AI in art with works by artists like Cécile B Evans and Refik Anadol.
-
Research: Scaling Laws for Precision (Harvard University, Stanford University, MIT, Databricks, Carnegie Mellon University)
Researchers introduce “precision-aware” scaling laws that predict how model performance changes with different training. The study reveals that low-precision post-training quantization becomes increasingly harmful as models train on more data, potentially making additional pretraining counterproductive. The research suggests optimal training precision lies between 7-8 bits, challenging assumptions about both high (16-bit) and very low (sub-4-bit) precision training. Validated on models up to 1.7B parameters trained on 26B tokens, these findings offer insights for optimizing large language model training and deployment through precision adjustments.
Radical Reads is edited by Leah Morris (Senior Director, Velocity Program, Radical Ventures).