Radical Reads

Unlocking Unstructured Data

By Radical Editorial

Image Source: Fast Company

Most of the world’s data — the data that is of real value to businesses — is not shared on the web or indexed by search engines. Rather, it resides in internal company folders, captured in scanned spreadsheets, PDFs, and images in what is known in the world of AI as ‘unstructured data.’ While underserved by traditional search engines, understanding and synthesizing vast quantities of internal and unstructured data is the lifeblood of the knowledge economy. Any company that can unlock the trove of information that lives in unstructured data, and generate real insights on demand, stands to be a category-defining business.

This is the rationale behind Radical’s investment in Hebbia, which was announced last week (see feature coverage in TechCrunch and Fast Company). Google has only indexed 4% of the world’s data. Hebbia aims to tap the potential of the other 96% of the world’s data, becoming the central platform for the knowledge worker economy.

Hebbia was founded by George Sivulka, a Stanford Ph.D. dropout with a research track record across many fields. After working at NASA at the age of 16, George earned his BS in Mathematics from Stanford with distinction in just over two years. George went on to complete his Master’s degree in Applied Physics before embarking on a Ph.D. at Stanford in Computational Neuroscience. During George’s doctoral studies, many of his friends gravitated toward finance, working on mergers and acquisitions and immersing themselves in thousands of due diligence documents during 100-hour work weeks. Identifying a problem that natural language processing could solve, George set out to build the world’s most intelligent search engine — one which leveraged his research to empower organizations to securely index and search every accessible document.

In the financial sector, firms spend billions of dollars on information research, hiring teams of analysts to pour over diligence documents and filings to facilitate high-stakes deals. While digital content is exponentially increasing in data rooms and private servers, the tools available to navigate that content have remained constant.

Hebbia’s AI platform is capable of reading and comprehending the untouched universe of unstructured, private data. Its ‘neural search’ enables a user to ask questions using natural language, with Hebbia searching all available data to provide answers immediately and accurately. Hebbia outperforms the current state of the art in machine learning information retrieval by an average of 57%. Hebbia’s software can index any document type — scanned PDFs, spreadsheets, transcripts — transforming into an expert search engine for almost any form of language-based data. From broad market research to targeted information extraction, Hebbia automates diligence workflows, acting as an AI Analyst and instantly surfacing deal points even humans miss.

Hebbia already counts twenty enterprises as customers, including several of the world’s largest private equity firms, hedge funds, and consultancies. Ultimately, George and the team at Hebbia are attacking the vast opportunity of unstructured data, where traditional search has failed and where the greatest economic value has yet to be realized. If you run a large enterprise with a significant effort devoted to documentary review or diligence, please let us know if your organization would like an introduction to Hebbia.

AI News This Week

  • Deep learning detection of active pulmonary tuberculosis at chest radiography matched the clinical performance of radiologists  (RSNA Radiology)

    Chest X-rays can be used by an AI system to detect TB at a level comparable to radiologists. According to researchers, the technique might be useful for screening in regions with few radiologists. If a test results in a positive, a sputum test or nucleic acid amplification testing would be administered (NAAT). These tests are relatively expensive, but if AI could filter the patients who need them, the benefits would be extensive.

  • Temperature impacts on hate speech online: evidence from 4 billion geolocated tweets from the USA  (The Lancet Planetary Health)

    In a surprising discovery, researchers used machine learning to uncover a connection between severe weather and rises in hate speech on Twitter. Hate speech increased by 22% in extreme heat, and increased by 12.5% on extremely cold days. The study results highlight not only the importance of climate change mitigation and adaptation against temperature extremes but also the need to effectively combat hate speech online and to provide resources for people who are affected.

  • French tax officials use AI to spot over 20,000 undeclared pools  (Le Parisien)

    In Europe, swimming pools are taxed according to their size as they boost a home’s property worth. French tax authorities are using machine learning to find unregistered pools. In trials across nine French regions since October 2021, the system identified 20,356 suspected undeclared swimming pools and made approximately €10 million. Of taxpayers whose pools were flagged, 94% did have an unregistered pool. Officials plan to expand the system to identify undeclared improvements like gazebos and verandas that can raise a home’s property taxes. Reluctant taxpayers will find it more difficult to avoid paying as automated systems are able to examine more nuances in an individual’s data trail.

  • Watch: Standford HAI’s Workshop on Foundation Models  (Stanford HAI)

    Day 1 and Day 2 

    Fei-Fei Li, AI luminary and Co-Director of the Stanford Institute for Human-Centered Artificial Intelligence (HAI), hosted a workshop on foundation models in collaboration with the Center for Research on Foundation Models (CRFM), a new initiative of the Stanford HAI. The CRFM defines foundation models as “a single model that is trained on raw data, potentially across multiple modalities, which can be usefully adapted to a wide range of tasks.” These models have demonstrated clear potential, which researchers see as the beginnings of a sweeping paradigm shift in AI.

  • AI model from Maastricht University claims to detect COVID-19 in people’s voices  (Analytics India Magazine)

    Researchers from Maastricht University say that AI can now detect COVID-19 infection in people’s voices. The study was presented at the international congress of the European Respiratory Society in Barcelona, Spain. The concept intends to help with infection identification in low-income nations where PCR testing is sometimes costly and challenging to provide.

Radical Reads is edited by Leah Morris (Senior Director, Velocity Program, Radical Ventures).