CVPR 2024 – Advancing Real-World Applications of Computer Vision

The 2024 IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR) has become a pivotal gathering for global leaders in computer vision, showcasing groundbreaking research and fostering industry advancements. This year CVPR hit a number of new milestones, with 12,000 researchers from 76 countries attending the conference in Seattle. A record-breaking 11,532 papers were submitted, a 26% increase from the previous year.

This week we share highlights from the 2024 CVPR workshops and tutorials which reflect the emerging trends and the complex challenges of adopting computer vision technologies in real-world settings.

Healthcare: The DEF-AI-MIA workshop emphasized significant advancements in AI for medical imaging, particularly in pathology and radiology. By employing deep learning techniques, researchers have enhanced the precision of medical image assessments, traditionally hampered by subjectivity and limited reproducibility. These methods now extract more clinically relevant information, pushing forward the integration of AI in routine clinical tasks. The discussions centered around ongoing research and the necessary validations for these AI-based methods to become commonplace in medical diagnostics.
Infrastructure: The AI City Challenge demonstrated AI’s potential to transform urban management and infrastructure. Focusing on traffic safety and transportation, researchers utilized AI to analyze sensor data, such as camera feeds, for actionable insights. This year’s challenge highlighted applications in retail business environments and Intelligent Traffic Systems (ITS), exploring multi-camera people tracking, traffic safety analysis, and other areas critical to enhancing urban safety and efficiency.
Automobiles: In transportation, the focus on self-driving technologies continued to dominate discussions. A tutorial by Radical Ventures portfolio company Waabi, “All You Need To Know About Self-Driving,” covered comprehensive aspects of autonomous driving—from existing solutions to the challenges ahead. The session aimed to pave the way for scalable, safe, and affordable autonomous driving. Additionally, the “Populating Empty Cities” workshop delved into leveraging simulation environments and virtual humans to advance robotics and autonomous driving technologies.
Entertainment and Productivity: Advances in video understanding and multimodal AI were showcased by Twelve Labs, another Radical Ventures company in attendance. The technology for video embedding and retrieval is pushing the boundaries of video-language modeling, allowing for detailed extraction of video content. The shift toward integrating video content into computing interfaces was further explored in the Computer Vision for Mixed Reality Workshop, which focused on AR/VR/MR technologies to create immersive experiences that blend virtual and real worlds seamlessly.

Overall, CVPR 2024 not only shared cutting-edge research but also provided a platform for the community to engage in meaningful discussions, explore collaborative opportunities, and envision the future applications of computer vision across various sectors. As the field continues to evolve, these interactions and competitive workshops are vital in shaping the direction of computer vision technology and its integration into everyday life.

AI News This Week

A way to let robots learn by listening will make them more useful (MIT Technology Review)

Researchers at Stanford’s Robotics and Embodied AI Lab have developed a system enabling robots to use sound to learn tasks, moving beyond vision-based training. Utilizing a GoPro and a microphone-equipped gripper, they gathered audio data from human demonstrations to train robots for tasks like flipping a bagel or erasing a whiteboard. This method significantly improved task success rates, highlighting the potential of audio as a valuable data source for robot training in varied environments. Advancements in multimodal perception could soon help robots move beyond warehouses into offices and homes, performing complex tasks with greater efficiency and reliability.
Patients may soon trust AI more than humans (Forbes)

Improvements in AI capabilities are increasingly building trust amongst patients in care settings. At Mount Sinai Health System, AI was used to monitor patients in “step down” units, detecting potential clinical deterioration and alerting the medical team. Patients monitored by AI were 43% more likely to receive heart and circulatory support medications and had a lower 30-day mortality rate (7%) compared to traditional monitoring (9.3%). A survey revealed that 64% of respondents would trust AI over a human doctor, with higher trust among Gen Z. Companies like Radical’s Signal 1 are using AI to integrate insights directly into existing care workflows, enhancing patient outcomes and hospital efficiency.
Listen: The 10,000x yolo researcher metagame — with Yi Tay of Reka (Latent Space)

LMSYS Chatbot Arena is an open platform where anyone can help evaluate and rank different AI chatbots and is typically the domain of large and well funded model labs like OpenAI, with about 600 people working on GPT-4, and Google with over 1,300 co-authors on the Gemini paper. Reka Core, a model developed by Radical Ventures portfolio company Reka, made headlines by debuting at #7 on the leaderboard with just 20 employees and US$60 million in funding, showcasing their new GPU infrastructure. Their success highlights the potential of “yolo runs”—high-risk, intuitive approaches to model training emphasized by researchers like Reka Cofounders Yi Tay and Jason Wei. By focusing on bold experimentation over systematic methods, Reka quickly scaled their models, demonstrating the power of innovative thinking in AI development.
As AI rises, data-center costs spiral. Quantum is the solution (The Globe and Mail)

Data centers face a crisis due to soaring energy demands driven by AI. Radical portfolio company Xanadu’s CEO Christian Weedbrook points out that AI applications currently consume 1.5% of global energy. Traditional renewable and nuclear energy solutions fall short in addressing inefficiency. Weedbrook argues that the solution lies in quantum computing, which offers exponential efficiency improvements for specific tasks. Future quantum data centers could perform tasks equivalent to thousands of standard data centers with the same energy consumption. Weedbrook calls for investments in quantum infrastructure in Canada leveraging its strong foundation in quantum science.
Research: An open-source framework for cost-effective LLM routing (LMSYS)

Researchers from UC Berkeley have developed RouteLLM, an open-source framework for cost-effective routing between large language models (LLMs). RouteLLM learns to route queries between a stronger, more expensive LLM and a weaker, cheaper LLM, optimizing for cost and performance. The system acts as a filter, using human preference data to determine the correct routing of messages. In tests routing between GPT-4 and Mixtral-8x7B, RouteLLM achieved significant cost savings while maintaining 95% of GPT-4’s performance. This initiative aims to reduce the high costs of powerful LLMs while maintaining quality outputs by leveraging different models’ strengths and weaknesses.

Radical Reads is edited by Ebin Tomy (Analyst, Radical Ventures)

Radical Blog

CVPR 2024 – Advancing Real-World Applications of Computer Vision

AI News This Week

A way to let robots learn by listening will make them more useful (MIT Technology Review)

Patients may soon trust AI more than humans (Forbes)

Listen: The 10,000x yolo researcher metagame — with Yi Tay of Reka (Latent Space)

As AI rises, data-center costs spiral. Quantum is the solution (The Globe and Mail)

Research: An open-source framework for cost-effective LLM routing (LMSYS)