The following post is based on the latest Radical Talks episode with host Molly Welch, and featuring Radical Ventures Partner Sanjana Basu, and Ribbon AI Co-founder and CEO Arsham Ghahramani. Listen to the podcast on Spotify, Apple or wherever you get your podcasts.
Voice AI technology has reached a turning point. What was once limited to basic commands and often frustrating interactions with early voice assistants has evolved into sophisticated conversational systems that are finding real applications in enterprise settings. The technology stack improvements over the past 12-18 months have enabled natural, contextual conversations that are transforming industries. This evolution is creating new competitive dynamics across industries, as companies discover that natural conversation may be the most powerful interface for human-AI collaboration.
The Technology Stack Breakthrough
The current voice AI stack consists of a cascading architecture with three main components: automatic speech recognition (ASR), large language models (LLMs), and text-to-speech models. Each has seen rapid improvements.
ASR systems now better understand background noise, accents, tone, and emotion, while LLMs have improved understanding and intent recognition significantly. Text-to-speech models have made the most dramatic progress, producing more natural and human-like speech.
As Sanjana Basu, Partner at Radical Ventures, notes: “Voice represents the most natural human interface we know. It is the primary evolutionary communication method we’ve used for centuries. It conveys rich emotional and contextual information, enables hands-free, eyes-free interaction that integrates seamlessly into daily activities, and is super efficient, because it is faster than typing.”
While emerging speech-native models that process audio directly offer theoretical advantages like lower latency and richer contextual information, the cascading architecture still dominates in production environments for practical reasons, including cost-effectiveness and control.
Leveraging Voice AI to Solve the Hiring Bottleneck
Co-founder and CEO Arsham Ghahramani and co-founder Dave Vu created Ribbon after facing their own hiring challenge at healthcare AI company Ezra: namely, hiring eight machine learning engineers in one quarter — one of the most competitive and difficult roles to fill in tech. “We were asking each other, ‘how the heck are we going to hire eight of these roles in one quarter and really deliver on everything that we need to?'” Ghahramani explains.
Ghahramani and Vu set out to solve this challenge by launching Ribbon, which addresses a fundamental hiring inefficiency: information density. While resumes provide sparse information, a five-minute voice conversation reveals rich details about energy, collaboration style, and achievements.
Ribbon customers upload a job description, receive an interview link, and candidates can interview anytime. The system conducts live research during interviews, accessing LinkedIn profiles to ask dynamic questions like skilled human recruiters. Some customers run interviews as long as 90 minutes, with consistently high candidate satisfaction. The current system runs seven AI models in real time to create each interview experience.
Building Trust and Competitive Advantage
While Ribbon uses a cascading architecture rather than end-to-end speech models, this provides crucial control over interview tone and emotion. “In an interview, the emotion that you show is really important,” Ghahramani explains. “It’s really strange if suddenly the AI starts laughing at you or being sarcastic. You do see this with speech-to-speech models where they show the wrong emotion at the wrong times.”
The system also addresses real-world conversation complexities. Studies with customer partners show Ribbon’s AI understands accents across six different types with higher fidelity than human recruiters. Since analysis happens after transcription, accents don’t factor into candidate evaluation, reducing bias. Studies with matched demographic cohorts show no change in hiring demographics when using Ribbon versus traditional methods.
“I think we should hold Ribbon to a bar that is 100 times better than the existing paradigm,” Ghahramani explains, comparing it to self-driving cars that must dramatically outperform average human drivers to be acceptable.
Today, Ribbon has over 450 customers in high-volume hiring sectors and large enterprises. The typical customer has 500-1,000 applicants per role, traditionally interviewing only about 60 candidates. With Ribbon, they can interview all applicants and hire 50-60% faster.
Ribbon’s focus on high-volume hiring serves a dual purpose beyond market opportunity. Companies with the most voice data for specific domains typically develop the best performing systems, so processing thousands of interviews helps Ribbon accumulate the conversational data needed to maintain competitive advantage. This data-driven approach reinforces their technical leadership in voice-based recruiting.
Voice AI Beyond Recruiting
Beyond recruiting, voice AI adoption is accelerating across industries where conversation has traditionally been central to operations. Healthcare systems are implementing voice agents for customer service and administrative tasks. Restaurants, car dealerships, and market research firms are all finding voice AI transforms their operations.
Government agencies and emergency services are also exploring applications. Radical’s investment in Prepared, which is building an AI-powered emergency response platform for 911 centers, exemplifies this trend across critical infrastructure.
These industries share a common thread: human conversation remains central to their core operations.
Current Limitations and Future Challenges
Despite progress, voice AI faces ongoing challenges. Ghahramani identifies multi-person conversations as a key frontier: “We’re really good in a one-on-one interview right now, but we want to do more complex types of interviews too.” Ribbon is working toward AI assistants that can participate in panel interviews while maintaining research capabilities and contextual awareness.
Basu highlights another challenge: “Consumer is much harder than enterprise from a voice perspective.” Consumer applications face higher user expectations than enterprise systems, where users tolerate imperfect interactions in customer service contexts. Consumer voice AI must pass what she calls “the speech Turing test” to achieve widespread adoption, where conversations become indistinguishable from human interaction to achieve widespread adoption.
Addressing these challenges will determine which voice AI applications achieve mainstream adoption beyond their current enterprise success.
The 2035 Vision: Multimodal and Frictionless
Looking to the next ten years, both Ghahramani and Basu see transformative potential.
In the recruiting space, Ghahramani envisions voice AI will help dramatically reduce job search friction: “Right now, the median role takes around 50 days or so. I think we can get that in the order of a few hours.” He also envisions a system where candidates complete a single 30-minute Ribbon interview that can be shared across multiple employers, reducing the friction that keeps people stuck in suboptimal jobs.
Basu sees voice AI as part of a broader multimodal future: “The text stack for voice will be an essential part of multimodal applications that kind of integrate visual and other contextual information to provide magical experiences.” She’s already meeting companies building digital companions for the home that integrate human-like voice with other modalities.
They see voice AI expanding along two paths: dedicated voice-only applications and broader multimodal experiences where voice works alongside visual and other technologies.
The Enterprise Opportunity
Voice AI’s current enterprise success reflects a fundamental shift in technological capability. The cascading architecture improvements over the past 12-18 months have enabled natural, contextual conversations that integrate seamlessly with existing business processes.
For enterprises evaluating voice AI, the technology has moved from experimental to production-ready. Companies like Ribbon demonstrate that voice AI can deliver measurable business outcomes: faster hiring, improved candidate access, reduced bias, and operational efficiency gains.
The strategic question isn’t whether voice AI will transform enterprise operations, but how organizations can best leverage conversational interfaces to augment human capabilities and create competitive advantages.