Perspectives on DeepSeek

This week, we share select insights from a Radical Ventures briefing on the DeepSeek-R1 release shared with our investors. DeepSeek’s reasoning models have sent ripples through the ecosystem, raising important questions about the future of AI innovation. Managing Partner Jordan Jacobs and Partner Rob Toews discuss DeepSeek’s technical achievements and strategic implications. Below is a summary of the main points from their conversation.

What is DeepSeek-R1, and who is behind it?

DeepSeek is a Chinese AI research lab of Highflyer, a quantitative hedge fund with fewer than 200 employees. The organization has released a series of increasingly capable open-source models: DeepSeek V2, released in mid-2024, rivalled GPT-3.5. It was followed by DeepSeek V3, released in December 2024, which reached GPT-4o level performance. Their latest release, R1, is a reasoning model built on top of V3 and is comparable to OpenAI’s o1 models.

What makes this model technically interesting?

While traditional language models require extensive human feedback during post-training stages, DeepSeek achieved comparable results through reinforcement learning without human-supervised fine-tuning.

To overcome U.S. export control limitations on advanced computing hardware, DeepSeek developed efficient solutions, such as multi-hedrad latent attention for reduced memory usage and multi-token prediction, which dramatically reduces compute requirements by predicting multiple tokens. These methods allowed them to achieve near-frontier model performance with (what they claim is) significantly less computing power than some of the frontier model companies.

Why has the R1 release become such a big story?

DeepSeek’s R1 model should not be interpreted as a foundational algorithmic breakthrough, rather it is best understood as an impressive feat of model architecture and engineering. Notably, its developers bypassed Nvidia’s CUDA framework, opting for lower-level PTX programming which was key to achieving near-state-of-the-art performance with dramatically less compute than previously thought possible.

R1’s impressive performance is not without controversy and caveats. OpenAI has accused DeepSeek of model distillation, suggesting it has evidence the Chinese company used OpenAI models to train R1. Furthermore, claims of DeepSeek’s access to 50,000 GPUs raise questions about DeepSeek’s access to compute resources. Importantly, the reported $5.7 million training cost of R1 represents only the final training run, not the total expense of developing the model, which likely involved numerous iterations and far greater investment.

Qualifications aside, DeepSeek published their model weights and detailed technical documentation, enabling anyone to build applications using R1. Their success as a Chinese lab working under U.S. export controls on AI chips demonstrates how constraints may drive innovation.

What are the implications for AI startups?

DeepSeek’s approach aligns with a thesis we have held at Radical since inception, namely that efficiency in model development and deployment will play a greater and greater role as model capabilities increase. As a validation of this thesis, Cohere has built high performance models at a fraction of the price and compute resources as those built by top labs, winning customers and achieving similar performance while maintaining significantly better margins. Companies like Writer and Reka AI also build smaller, capital efficient, highly accurate models for enterprises, using far less compute than OpenAI and Anthropic..

CentML’s CEO Gennady Pekhimenko put it this way: “The AI arms race has officially shifted from who can scale up the fastest to who can scale the smartest, and system level optimizations play a huge role in this.” CentML, which optimizes the latest chips and open source models, deployed DeepSeek-R1 on its platform within days of the model’s release.

The future of AI lies not just in scale, but in efficient solutions that drive adoption.

What are the implications for the broader AI ecosystem?

DeepSeek’s achievement demonstrates that cutting-edge performance can be achieved with significantly less capital, putting downward pressure on model pricing. We can expect frontier labs to move up the stack and focus on products and applications instead of just selling access to models.

Contrary to surface-level assumptions, DeepSeek’s efficiency gains do not necessarily mean reduced chip demand. According to Jevons paradox, cheaper, more efficient models will likely increase overall demand for AI compute. This shift will accelerate the migration from training to inference (in production) workloads.

Founders now have access to powerful reasoning models at significantly lower costs, accelerating innovation in AI. While most significant enterprises in the West will not use Chinese models, similar innovations are expected from Western companies, which will inevitably spark a wave of closed and open-source development, accelerating both consumer and enterprise adoption of AI applications. Ultimately, lower costs and improved efficiency make sophisticated AI capabilities more accessible to every business, everywhere.

AI News This Week

The robot doctor will see you now (The New York Times)

Recent research challenges assumptions about AI-physician collaboration in healthcare. A MIT-Harvard study found that AI achieved 92% diagnostic accuracy working independently, while physicians using AI assistance only reached 76% accuracy. Early evidence suggests AI could handle certain routine cases independently as well. In a Swedish trial of over 80,000 mammograms, AI-assisted screening identified 20% more breast cancers while halving radiologist workload. These findings suggest a potential division of labor where AI handles routine cases independently while doctors focus on complex disorders.
The rise of DeepSeek: What the headlines miss (RAND)

Analysis from RAND suggests DeepSeek’s rise reflects a combination of engineering prowess, resource access, and potentially, leveraging existing models, rather than a fundamental AI breakthrough. While impressive from an engineering perspective (evidenced by their bypassing of CUDA and using PTX), DeepSeek’s R1 may rely on model distillation, potentially raising intellectual property concerns (as alleged by OpenAI). Furthermore, the commentary cautions against oversimplifying the costs of training such large models. The widely cited $5.7 million figure represents only a single, final training run, not the cumulative costs of research, development, and numerous prior iterations. The analysis concludes that it may be too early to evaluate the impact of export controls and the challenges they ultimately pose for Chinese companies like DeepSeek.
Human hands are astonishing tools. Here's why robots are struggling to match them (BBC)

New robotic hands with real-time sensory feedback and AI-driven control systems demonstrate progress in matching human dexterity. Applications under development range from delicate fruit-picking to AI-powered prosthetics that adapt to muscle signals. Human hands, with a network of 17,000 touch receptors and 27 degrees of freedom, can handle items of different shapes, sizes and materials. Recent advancements suggest robots could eventually learn versatile skills to adapt to unpredictable situations and may soon match human hand capabilities.
High-tech antidotes for snake bites (The Economist)

David Baker, who won the 2024 Nobel Prize in Chemistry for his work on computational protein design, and his University of Washington team have developed an AI system that designs proteins to neutralize snake venom toxins. The system first calculates optimal protein shapes to block toxins, then determines the required amino acid sequences, and finally verifies the proteins will fold correctly. The AI-engineered proteins showed 80-100% effectiveness in mouse trials. This breakthrough offers a safer alternative to traditional antivenin production.
Research: Large language model is secretly a protein sequence optimizer (Tufts/Northeastern/Cornell/UC Berkeley)

Researchers have discovered that standard Large Language Models (LLMs) possess unexpected capabilities in protein engineering without requiring modifications. Using an evolutionary approach, the team demonstrated that LLMs can effectively optimize protein sequences. The study reveals significant “capability overhangs” in existing AI systems, suggesting that integrating LLM-based optimization into experimental pipelines could accelerate protein engineering and scientific discovery.

Radical Reads is edited by Ebin Tomy (Analyst, Radical Ventures)

Radical Blog

AI News This Week

The robot doctor will see you now (The New York Times)

The rise of DeepSeek: What the headlines miss (RAND)

Human hands are astonishing tools. Here's why robots are struggling to match them (BBC)

High-tech antidotes for snake bites (The Economist)

Research: Large language model is secretly a protein sequence optimizer (Tufts/Northeastern/Cornell/UC Berkeley)