Let’s get one thing straight: AI hardware is no longer just about training. The real action is shifting to inference—the moment when neural networks actually do something. Think of it as the difference between cramming for an exam and acing it in real time. From reasoning models, to instant personal and coding assistance to enabling machines to use vision in making split-second decisions in fields to factories, inference is where AI meets the physical world.
As models evolve—think DeepSeek’s inference-intensive mixture-of-experts (MoE) architectures—the hardware is hitting a wall. GPUs? They’re inefficient and too power-hungry for this new era. CPUs? Too sluggish. MoE models, which dynamically route tasks to specialized subnetworks, are exploding in popularity because they are faster and more efficient. The catch? Traditional processors weren’t built for this. GPUs, designed to brute-force parallel computations, waste energy shuttling data between memory and compute units.
The age of inference demands purpose-built inference chips and UntetherAI is one of the few semiconductor companies with chips in the market specially designed to service this new era. Instead of forcing data to crawl through a bottleneck between memory and processor, UntetherAI chips break the memory wall by placing compute directly with memory—a design called “at-memory compute.” For MoE models, which thrive on sparse, dynamic computation, this means blazing-fast routing between experts. For time-sensitive inference, it enables real-time processing of temporal data streams without melting your power budget. And because these chips are purpose-built for inference, they sidestep the bloat of legacy hardware.
This demand for inference hardware becomes even more acute as AI intersects with the real world. The future of AI is not solely the hyperscaler cloud; rather, it is at the edge—in your phone, your car, your factory robots, in on-prem, regional, and sovereign datacenters. These environments do not have the luxury of infinite power or cooling. A tractor in a field can’t lug around a server rack, and powering AI for your enterprise should focus on deploying models, not new concrete and plumbing to deal with the power issues driven by GPUs. UntetherAI’s chips, which boast world-leading efficiency, are tailor-made for this reality.
Like every paradigm shift, this new age of inference is exposing the flaws in yesterday’s tech stack. DeepSeek’s innovations in MoE architectures are just the beginning. Just like new architectures and design points were needed for phones and the best laptops that are different than what’s in a desktop or server, what comes next—ubiquitous AI running in tight spaces on the edge or more compute per rack in a data center– will hinge on hardware that’s fast, efficient, and ruthlessly focused on solving the emerging power wall.
Inference is not the future. It’s the now. And the now needs better silicon.
For more information on Untether AI, please visit their website.
AI News This Week
-
Waabi and Volvo team up to build self-driving trucks at scale (TechCrunch)
Radical portfolio company Waabi has announced a strategic partnership with Volvo to jointly develop and deploy self-driving trucks at scale. The collaboration will integrate Waabi’s proprietary driver technology directly into Volvo’s production line in Virginia. Led by CEO Raquel Urtasun, the company plans to launch commercial pilots in Texas within months, targeting fully driverless operations between customer depots by late 2025. Urtasun emphasizes this is just the beginning, with plans to expand beyond trucking into robotaxis and warehouse robotics.
-
Start with satellite images of the earth. Then add AI. (The Wall Street Journal)
New AI tools are democratizing access to satellite image analysis. Foundation models, powered by transformer technology, can now analyze satellite imagery to detect everything from illegal airstrips in the Amazon to urban development patterns. These models are fed data from constellations of low-earth orbit satellites for different use cases. Radical portfolio companies Pixxel, which is building hyperspectral earth imaging satellites, and Muon Space, whose “FireSat” satellites monitor wildfires, are both examples. While primarily used by government agencies for disaster response and urban monitoring, these tools allow non-experts to discover global patterns that previously required extensive resources and expertise.
-
What’s next for smart glasses (MIT Technology Review)
Thanks to AI agent integration, smart glasses are poised for mainstream adoption. AI agents will provide contextual awareness and unprompted interactions through multimodal language models that process video, audio, and text simultaneously. Expected features include shopping reminders when passing stores, facial recognition, and interactive conversations based on surroundings. Some industry leaders view smart glasses as the ideal platform for AI deployment, with display-enabled models expected in 2025. Critical to the wider adoption of this technology is efficient edge AI computing as it allows real-time processing of complex sensor data directly on the device to minimize latency, preserving battery life, and safeguarding privacy.
-
China’s cheap, open AI model DeepSeek thrills scientists (Nature)
The DeepSeek-R1 release is sparking excitement in the scientific community by democratizing access to advanced AI research capabilities. Released as an open-weight model, it allows researchers to study and build upon its algorithms, which is a stark contrast to the “black box” nature of leading commercial models. Given their more cost-effective architecture, models developed by DeepSeek make complex AI experimentation financially viable for research labs. The model particularly excels at scientific reasoning tasks, achieving comparable results to top models in chemistry and mathematics.
-
Research: Diverse preference optimization (Meta/NYU/ETH Zürich)
Researchers have developed a novel training method called Diverse Preference Optimization (DivPO) to address the lack of diversity in language model outputs. The key innovation modifies how training pairs are selected during preference optimization. Instead of choosing the highest and lowest rewarded responses, DivPO selects responses based on both quality thresholds and diversity measures. The method has proven effective, achieving 45.6% more diverse outputs in structured tasks and a 74.6% increase in story diversity while maintaining output quality. This breakthrough is particularly significant for creative tasks and synthetic data generation, where varied responses are crucial for model training and improvement.
Radical Reads is edited by Ebin Tomy (Analyst, Velocity Program, Radical Ventures).