Radical Blog

Perspectives on DeepSeek

By Rob Toews, Partner and Jordan Jacobs, Managing Partner

http://deepseek%20(1)

This week, we share select insights from a Radical Ventures briefing on the DeepSeek-R1 release shared with our investors. DeepSeek’s reasoning models have sent ripples through the ecosystem, raising important questions about the future of AI innovation. Managing Partner Jordan Jacobs and Partner Rob Toews discuss DeepSeek’s technical achievements and strategic implications. Below is a summary of the main points from their conversation.

What is DeepSeek-R1, and who is behind it?

DeepSeek is a Chinese AI research lab of Highflyer, a quantitative hedge fund with fewer than 200 employees. The organization has released a series of increasingly capable open-source models: DeepSeek V2, released in mid-2024, rivalled GPT-3.5. It was followed by DeepSeek V3, released in December 2024, which reached GPT-4o level performance. Their latest release, R1, is a reasoning model built on top of V3 and is comparable to OpenAI’s o1 models. 

What makes this model technically interesting?

While traditional language models require extensive human feedback during post-training stages, DeepSeek achieved comparable results through reinforcement learning without human-supervised fine-tuning.

To overcome U.S. export control limitations on advanced computing hardware, DeepSeek developed efficient solutions, such as multi-hedrad latent attention for reduced memory usage and multi-token prediction, which dramatically reduces compute requirements by predicting multiple tokens. These methods allowed them to achieve near-frontier model performance with (what they claim is) significantly less computing power than some of the frontier model companies.

Why has the R1 release become such a big story?

DeepSeek’s R1 model should not be interpreted as a foundational algorithmic breakthrough, rather it is best understood as an impressive feat of model architecture and engineering. Notably, its developers bypassed Nvidia’s CUDA framework, opting for lower-level PTX programming which was key to achieving near-state-of-the-art performance with dramatically less compute than previously thought possible. 

R1’s  impressive performance is not without controversy and caveats. OpenAI has accused DeepSeek of model distillation, suggesting it has evidence the Chinese company used OpenAI models to train R1. Furthermore, claims of DeepSeek’s access to 50,000 GPUs raise questions about DeepSeek’s access to compute resources. Importantly, the reported $5.7 million training cost of R1 represents only the final training run, not the total expense of developing the model, which likely involved numerous iterations and far greater investment.

Qualifications aside, DeepSeek published their model weights and detailed technical documentation, enabling anyone to build applications using R1. Their success as a Chinese lab working under U.S. export controls on AI chips demonstrates how constraints may drive innovation.

What are the implications for AI startups?

DeepSeek’s approach aligns with a thesis we have held at Radical since inception, namely that efficiency in model development and deployment will play a greater and greater role as model capabilities increase. As a validation of this thesis, Cohere has built high performance models at a fraction of the price and compute resources as those built by top labs, winning customers and achieving similar performance while maintaining significantly better margins. Companies like Writer and Reka AI also build smaller, capital efficient, highly accurate models for enterprises, using far less compute than OpenAI and Anthropic.. 

CentML’s CEO Gennady Pekhimenko put it this way: “The AI arms race has officially shifted from who can scale up the fastest to who can scale the smartest, and system level optimizations play a huge role in this.” CentML, which optimizes the latest chips and open source models, deployed DeepSeek-R1 on its platform within days of the model’s release. 

The future of AI lies not just in scale, but in efficient solutions that drive adoption.

What are the implications for the broader AI ecosystem?

DeepSeek’s achievement demonstrates that cutting-edge performance can be achieved with significantly less capital, putting downward pressure on model pricing. We can expect frontier labs to move up the stack and focus on products and applications instead of just selling access to models.

Contrary to surface-level assumptions, DeepSeek’s efficiency gains do not necessarily mean reduced chip demand. According to Jevons paradox, cheaper, more efficient models will likely increase overall demand for AI compute. This shift will accelerate the migration from training to inference (in production) workloads. 

Founders now have access to powerful reasoning models at significantly lower costs, accelerating innovation in AI. While most significant enterprises in the West will not use Chinese models, similar innovations are expected from Western companies, which will inevitably spark a wave of closed and open-source development, accelerating both consumer and enterprise adoption of AI applications. Ultimately, lower costs and improved efficiency make sophisticated AI capabilities more accessible to every business, everywhere.