Radical Reads

Advancing Video Understanding with AI

By Aiden Lee, CTO at Twelve Labs

http://Screenshot%202025%2001%2020%20093007

Source: Twelve Labs

Twelve Labs, a Radical Ventures portfolio company, is pioneering advanced multimodal video understanding, demonstrating substantial improvements across visual, audio, and text modalities. With their latest model, Marengo 2.7, the company has achieved state-of-the-art performance in complex video understanding tasks. The model maintains high precision in detecting small objects and exceptional performance in general text-based search tasks through its innovative multi-vector approach and comprehensive evaluation framework. This week, we share an excerpt of this release covering the background of this technology.

Introduction to multi-vector video representation

Unlike text, where a single word embedding can effectively capture semantic meaning, video content is inherently more complex and multifaceted. A video clip simultaneously contains visual elements (objects, scenes, actions), temporal dynamics (motion, transitions), audio components (speech, ambient sounds, music), and often textual information (overlays, subtitles). Traditional single-vector approaches struggle to effectively compress all these diverse aspects into one representation without losing critical information. This complexity necessitates a more sophisticated approach to video understanding.

To address this complexity, Marengo 2.7 uses a unique multi-vector approach. Instead of compressing everything into a single vector, it creates separate vectors for different aspects of the video. One vector might capture what things look like (e.g., “a man in a black shirt”), another tracks movement (e.g., “waving his hand”), and another remembers what was said (e.g., “video foundation model is fun”). This approach helps the model better understand videos that contain many different types of information, leading to more accurate video analysis across all aspects – visual, motion, and audio.

Evaluated on 60+ multimodal retrieval datasets

Existing benchmarks for video understanding models often rely on detailed, narrative-style descriptions that capture the main events in a video. However, this approach doesn’t reflect real-world usage patterns, where users typically make shorter, more ambiguous queries like “find the red car” or “show me the celebration scene.” Users also frequently search for peripheral details, background elements, or specific objects that may only appear briefly. Additionally, queries often combine multiple modalities – visual elements with audio cues, or text overlays with specific actions. This disconnect between benchmark evaluation and actual use cases necessitated a more comprehensive evaluation approach for Marengo 2.7.

State-of-the-Art Performance with unparalleled Image-to-Visual Search Capabilities

Marengo 2.7 demonstrates state-of-the-art performance across all main benchmarks, with particularly remarkable achievements in image-to-visual search capabilities. While the model shows strong performance across all metrics, its performance in image object search and image logo search represent a significant leap forward in the field.

To learn more, read Twelve Lab’s full announcement.

AI News This Week

  • RBC partners with Cohere to develop AI platform for bank employees  (The Globe and Mail)

    Royal Bank of Canada (RBC) and Radical portfolio company Cohere are partnering to build North for Banking, a bank-wide generative AI platform. The first such deployment by a Canadian bank, the platform will help RBC employees complete tasks and find information specific to customer needs across operations. RBC will run the platform entirely on its internal systems using one of Canada’s largest private GPU clusters. 

  • How AI uncovers new ways to tackle difficult diseases  (BBC)

    AI is transforming pharmaceutical research, dramatically reducing drug discovery timelines from 4 years to 18 months by cutting the number of test molecules needed from 500 to under 80. Traditional drug development costs over $2 billion and faces 90% failure rates in clinical trials. AI-driven approaches are promising, with at least 75 AI-discovered molecules now in clinical trials. Radical Ventures portfolio company Nabla Bio recently achieved a breakthrough in de novo antibody design. Their JAM AI system produced the first demonstration of AI-designed antibodies capable of targeting hard-to-drug membrane proteins. 

  • AI and robotic arms flex new tech muscles to boost lagging home construction in Canada  (CBC)

    Radical Ventures portfolio company Promise Robotics is transforming construction with AI-powered robotic arms that autonomously assemble house components. The company’s portable system cuts house construction time in half, which may help address Canada’s housing shortage crisis, which requires 3.87 million new homes by 2031. Promise Robotics attracted interest from over 20 Canadian builders in 2024.

  • Training AI models might not need enormous data centres  (The Economist)

    A new distributed training method is challenging the traditional approach of using massive GPU clusters for AI development. The technique enables training across smaller, distributed data centers, reducing communication overhead by 500-fold while improving model generalization. Recent tests demonstrated this approach by training a 10B-parameter language model using just 30 GPU clusters across eight cities, achieving 83% efficiency compared to centralized training.

  • Research: Self-adaptive LLMs  (Sakana AI/Institute of Science Tokyo)

    Researchers have developed a framework that enables language models to adapt dynamically like living organisms. The system employs a novel two-pass approach: first, it analyzes task requirements, then applies precise adjustments to the model’s “brain” through Singular Value Decomposition. This method outperforms existing approaches using fewer parameters, especially in advanced mathematics and visual understanding tasks. The research shows that skills learned by one model can be transferred to another, suggesting a path toward continuously evolving AI systems that combine different types of reasoning to solve complex problems.

Radical Reads is edited by Leah Morris (Senior Director, Velocity Program, Radical Ventures).