Radical Blog

Grading Our 2025 AI Predictions

By Rob Toews, Partner

http://Screenshot%202025%2012%2028%20144914

At the end of 2024, we published 10 predictions about what would happen in the world of artificial intelligence in 2025.

To keep ourselves honest, with 2025 now coming to a close, let’s revisit these predictions to see how things actually played out. There is much to learn from these retrospectives about the state of AI today.

Prediction 1: Meta will begin charging for use of its Llama models.

Outcome: Wrong

Meta’s AI organization underwent tremendous change in 2025. After the disappointing debut of its flagship Llama 4 model, combined with the emergence of superior Chinese open-weight models like DeepSeek, Qwen and Kimi, Meta CEO Mark Zuckerberg has taken dramatic steps to overhaul his company’s AI strategy.

Meta essentially acquihired Alex Wang and Scale AI for $14 billion, with Wang becoming the company’s new chief AI officer; went on an extravagant AI hiring spree that included making $1 billion offers to individual researchers; laid off hundreds from its AI organization; parted ways with its legendary Chief Scientist Yann LeCun; and restructured its AI organization several times.

Speculation abounds that Meta plans to abandon its open-weight AI strategy and that its next flagship model will be proprietary. The Llama line of models is likely done. (Apparently, Meta’s new model is code-named “Avocado.”)

In all likelihood, Meta will seek to monetize its frontier models going forward. But the company has not done so yet.

Prediction 2: Scaling laws will be discovered and exploited in areas beyond text — in particular, in robotics and biology.

Outcome: Right

As of a year ago, almost all discussion about scaling laws focused on language. Over the course of 2025, we have seen increasing evidence of scaling laws in a range of other data modalities.

Robotics is a big one. The company that has publicly demonstrated the most concrete evidence of scaling laws in robotics is Generalist AI. In a blog post last month, Generalist AI shared impressive data showing that its models’ performance reliably improves with increased pretraining data and compute according to a power law. The curves are reminiscent of the early scaling law curves that OpenAI published for its large language models.

The scaling law chart from Generalist AI’s November 2025 blog post. These curves look remarkably similar to the early scaling law curves that OpenAI published for its large language models. Generalist AI

Though they have not shared as much publicly, other startups pursuing foundation models for robotics, including Physical Intelligence, are also said to be making progress on discovering and exploiting scaling laws.

Another modality in which scaling laws have recently been demonstrated is autonomous vehicles. Over the summer, Waymo published research showing the existence of scaling laws for its AV models, in particular for motion forecasting and planning.

As the Waymo team wrote: “Through these insights, researchers and developers of AV models can begin to know with certainty that enriching the quality and size of the data and models will deliver better performance. Being able to predictably scale these models places us on a path to continually improve our understanding of the diverse and complex behaviors that AVs encounter daily.”

Biology is another field in which it is becoming clear that scaling laws will play an important role. As two examples, protein AI startups Profluent and Nabla Bio both published work this year demonstrating that, as they scaled compute, training data and/or model size, the quality of the proteins their AI systems generated reliably improved. Interestingly, Nabla’s work indicates the presence of scaling laws for test-time compute, a particularly novel research direction in biology.

“We believe that expanding the reasoning capacity of biomolecular generative models through increased test-time computation will become a fundamental ‘scaling law’ important for the design of biological systems,” wrote the Nabla team in its May 2025 paper. “Just as test-time reasoning is rapidly transforming language model capabilities and enabling machines to solve increasingly complex problems, test-time scaling in biological design may too soon follow a similar trajectory.”

Though the results have not always been published, scaling laws have emerged this year in a wide range of other data modalities as well, from brain data to tabular data to video understanding.

Prediction 3: Donald Trump and Elon Musk will have a messy falling-out. This will have meaningful consequences for the world of AI.

Outcome: Right

Donald Trump and Elon Musk’s bromance was a dominant theme in the technology and political zeitgeist in the first part of 2025. As predicted, it did not last.

The relationship began deteriorating in May over the “Big Beautiful Bill,” with Trump championing it and Musk fiercely opposing it, viewing it as egregious government spending that flew in the face of his Department of Government Efficiency. Musk publicly called the bill a “disgusting abomination.”

June 5, 2025, was the date that Trump and Musk’s relationship imploded in spectacular fashion. If anything, “messy” proved to be an understatement. Musk called for Trump’s impeachment, floated the creation of a new political party and accused Trump of being named in the Epstein files. Trump threatened to cut Musk’s government contracts and called Musk crazy. The internet could talk about little else for the next several days.

Assessing the consequences of Trump and Musk’s falling-out for the world of AI requires some speculation, since we do not know the counterfactual. But it is safe to assume that it had a meaningful impact. For one thing, given Musk’s deeply hostile relationship with Sam Altman and OpenAI, had Musk remained an influential voice in the White House, OpenAI likely would have received less support and found it more difficult to work with the U.S. government over the course of 2025.

As another example, Musk is an advocate for robust AI safety regulation, including at the state level; last year he supported California’s controversial SB 1047 bill. In Musk’s absence, the Trump administration has adopted an entirely hands-off stance toward AI regulation of any kind. Just last week, President Trump issued an executive order that bans states from implementing any AI regulations at all.

Prediction 4: Web agents will go mainstream, becoming the next major killer application in consumer AI.

Outcome: Wrong

This year saw plenty of progress with web agents and computer-use agents.

OpenAI’s browser agent product, called Operator, launched with much fanfare in early 2025. Over the summer, Anthropic launched a similar product, Claude for Chrome, designed to automatically read web pages, fill out forms, navigate sites and complete multi-step web tasks. Buzzy startup Yutori just launched its web agent product to general availability. Perplexity and OpenAI, among others, have recently released new AI-native web browsers with built-in agentic browser capabilities.

Yet none of these products has yet seen serious mainstream adoption. Certainly, none can yet be described as consumer AI’s “next major killer app.” How many people do you know who actually use Claude for Chrome on a regular basis to automate web tasks? Or who has switched their default internet browser to Perplexity Comet?

The potential for this product category is obviously enormous. It seems inevitable that AI agents will eventually automate most tasks on the internet that people today complete manually. But not in 2025. Why not? Above all, because these products — while they demo well — don’t yet always work reliably and generalizably enough to be compelling for everyday use.

Perhaps their breakout moment will come in 2026.

Prediction 5: Multiple serious efforts to put AI data centers in space will take shape.

Outcome: Right

Of all of last year’s predictions, this one received more skepticism and even derision than any other. Readers far and wide commented on how unserious and impractical the idea of AI compute in orbit would be.

What a difference a year makes.

“Data centers in space” has become one of the trendiest and most consensus technology trends of 2025. Elon Musk has become a vocal champion of the idea, stating publicly that SpaceX is pursuing the opportunity. Same with Jeff Bezos and Blue Origin. Last month, Google announced a major new initiative named Project Suncatcher to put TPUs in orbit, with the first chips going up as soon as 2027. Eric Schmidt acquired launch company Relativity Space with the explicit goal of developing orbital data centers. Starcloud, an early pioneer of the concept of data centers in space, is working with Nvidia on the opportunity. Startups like Aetherflux are abruptly pivoting to get on the bandwagon.

Space is hard. It will take many years for the amount of computational power in space to scale to meaningful levels. But this is definitely happening — it makes too much sense not to do — and 2025 was the year that became obvious to everyone.

Prediction 6: An AI system will pass the Turing test for speech.

Outcome: Wrong

As predicted last year, 2025 was a breakout year for voice AI. Numerous voice-first AI products have launched and scaled rapidly this year in areas ranging from customer support to sales to real estate to consumer chatbots.

A major technical advance that has fueled this growth is the emergence of speech-to-speech models: AI models that can take spoken audio as input and directly produce spoken audio as output without needing to convert the audio to text to analyze it as an intermediate step. Today’s most advanced voice AI models — for instance, those from Google’s Gemini and OpenAI’s ChatGPT — are speech-to-speech.

Yet voice AI models have not yet reached a level of performance at which they are consistently indistinguishable from humans. In other words, they have not yet passed the “Turing test for speech.”

Spend some time conversing out loud with ChatGPT and you will get a firsthand appreciation that, on various dimensions, the experience does not feel as natural and fluid as conversing with another human. Latency is still a problem; voice AI models still sometimes have trouble with natural turn-taking and mid-utterance interruption; they often sound too polished and articulate; and especially in longer conversations, their lack of genuine emotional state and personhood become increasingly evident.

Prediction 7: Major progress will be made on building AI systems that can themselves autonomously build better AI systems.

Outcome: Right

In 2025, the idea of AI systems that can autonomously build better AI systems — often referred to as recursive self-improvement, or RSI — took center stage in the world of AI research.

In the first half of the year, AI startups AutoscienceIntology, and Sakana each debuted systems that produced research papers entirely autonomously that were accepted into leading AI research conferences via a process of blind review (meaning that the human reviewers did not know the research had been carried out by an AI). Sakana’s and Autoscience’s papers were accepted to workshops at the International Conference on Learning Representations, while Intology’s was accepted at the main proceedings at the Association for Computational Linguistics.

A couple months ago, OpenAI acknowledged publicly that it is working on building an “AI researcher” — an AI system that can autonomously carry out its own research — saying that it expects to have an initial working version in 2026 and a full-fledged system by 2028.

A number of highly pedigreed, well-funded startups dedicated to building recursively self-improving AI systems have formed over the past several months. Most are still in stealth. Expect to see many of these startups launch publicly in 2026.

Recursive self-improvement is an exciting and intuitive concept. If there is a path to a “fast takeoff” and a superintelligence explosion, it will most likely involve RSI. No one has gotten this to work yet, but this year, many have begun seriously trying.

Prediction 8: OpenAI, Anthropic, and other frontier labs will begin ‘moving up the stack,’ increasingly shifting their strategic focus to building applications.

Outcome: Right

While OpenAI and Anthropic still build frontier models, these organizations’ commercial focus has shifted up the stack to the application layer.

The first “killer application” for LLMs is coding, and the big labs competed ferociously this year over the coding AI market. Anthropic has historically had an advantage in this area, and its Claude Code product (launched in February) has seen tremendous success; OpenAI’s Codex product (launched in May) has more recently gained momentum as OpenAI’s models continue to improve for coding tasks.

From financial services to life sciences, both OpenAI and Anthropic have invested heavily this year in developing industry-specific applications and solutions.

In September, OpenAI announced that it was developing a new AI-powered hiring platform that will compete with LinkedIn, to be launched next year. Rumors abound that both labs are working on other first-party applications in areas including legal, customer support and go-to-market. Time will tell whether and when standalone products in these areas see the light of day.

And lest we forget: the centerpiece of OpenAI’s commercial strategy and the primary driver of its staggering 2025 revenue growth — from $6 billion ARR at the beginning of the year to $20 billion ARR at the end of the year — was ChatGPT, which is, after all, an application.

Prediction 9: Robotaxi services will win double-digit market share in ride-hailing in at least 5 major U.S. cities.

Outcome: Wrong

Close, but not quite!

Waymo’s robotaxi service is currently available to the general public in five cities: San Francisco, Phoenix, Los Angeles, Austin and Atlanta (the latter two through the Uber app).

According to figures from YipitData, an alternative data vendor, Waymo’s share of the ride-hailing market in these five cities as of October was:

  • San Francisco: 24% (compared to Uber at 54%)
  • Phoenix: 16% (compared to Uber at 52%)
  • Los Angeles: 13% (compared to Uber at 56%)
  • Austin: 8% (compared to Uber at 64%)
  • Atlanta: 6% (compared to Uber at 59%)

Meanwhile, Zoox (Amazon) has launched a robotaxi service in Las Vegas but has not yet scaled to meaningful market share there.

So robotaxi services won double-digit market share in three major U.S. cities this year, with significant single-digit market share (8% and 6%) in two others. Almost!

Expect to see these figures continue to ramp up in 2026. A few weeks ago, Waymo announced the next five markets where it plans to launch in the coming weeks: Miami, Dallas, Houston, San Antonio, Orlando. And Zoox just launched its own robotaxi service in San Francisco, with plans to deploy in other cities soon.

The era of autonomous vehicles has officially arrived.

Prediction 10: The first real AI safety incident will occur.

Outcome: Wrong

Last year, I wrote for this prediction:

“As artificial intelligence has become more powerful in recent years, concerns have grown that AI systems might begin to act in ways that are misaligned with human interests and that humans might lose control of these systems. Imagine, for instance, an AI system that learns to deceive or manipulate humans in pursuit of its own goals, even when those goals cause harm to humans.”

“This general set of concerns is often categorized under the umbrella term ‘AI safety.’

“AI creates plenty of other societal challenges, from facilitating surveillance to perpetuating bias, but topics like these are distinct from the field of AI safety, which more specifically concerns itself with the risk that AI systems will begin to behave in misaligned ways that are outside of human control, perhaps even eventually posing an existential threat to humanity.”

AI caused plenty of problems in 2025. To take one example, Anthropic recently reported that it had detected and disrupted the first-ever AI-orchestrated cybersecurity attack. According to Anthropic, a Chinese state-sponsored group jailbroke Claude and unleashed it to autonomously hack into certain target organizations.

But no true AI safety incident of the kind discussed above happened (or at least was publicly reported) this year.

In the Anthropic cybersecurity example, humans (the hackers) still defined the AI model’s goals and directed its high-level actions. Claude acted in accordance with what its human users wanted it to do, even if those actions were societally detrimental.

We have not yet seen an example of an AI system truly going rogue, formulating and acting on its own goals in explicit conflict with its human users’ intentions: for instance, concealing the true extent of its capabilities from humans, or covertly creating copies of itself on another server in order to perpetuate itself, or otherwise manipulating humans to advance its own objectives.

It will happen eventually.

Read Rob’s past predictions for 2025 in full on Forbes. This article has also been cross-posted on Rob’s regular column on artificial intelligence for Forbes.

Radical Reads is edited by Ebin Tomy (Analyst, Radical Ventures)