Radical Reads

Exclusive Q&A with Geoffrey Hinton – A big idea for solving vision

By Aaron Brindle, Partner, Public Affairs


Source: Google

AI pioneer, Vector Institute Chief Scientific Advisor and Turing Award winner Geoffrey Hinton published a paper last week on how recent advances in deep learning might be combined to build an AI system that better reflects how human vision works. Hinton’s system is called “GLOM” and in this exclusive Q&A with Radical partner Aaron Brindle, Geoffrey explains how it works, its implications for everything from self-driving cars to natural language processing, and why he landed on the term (or acronym?) GLOM.

The following has been edited for length and clarity.

Aaron Brindle: What’s the biggest difference in how a human brain processes an image versus current neural networks?

Geoff Hinton: For starters, there are enormous differences in the hardware. The brain processes images using a huge number of connections at low power. Computers have fewer connections but loads more power. Computer vision models, historically, have looked at single images where a static picture is presented at a uniform resolution. Traditional AI vision systems try to process the entirety of that uniform image.

That’s completely different from what people do. For humans, vision is really a sampling process, where the eye makes real time decisions around what information in the field of vision is going to be further deciphered. For example, we’re very good at quickly sampling and processing anything that moves. The same is true for something that has a different colour from its background. When the eye fixates, whatever is in the middle of the retina is at high resolution and whatever is around the edge of that will be at a low resolution. You process what’s in focus for several 100 milliseconds before your eye fixates on something else. GLOM is about the deep learning process that happens after the system fixates on an image. It addresses a research problem I’ve had for the past 50 years and feels much closer to an understanding of how the brain might be doing vision.

AB: So what is GLOM solving for?

GH: Deep learning has been good for things like perception and for motor control and for manipulating objects. There’s a belief among some people in the research community, however, that deep learning is not so good at tackling symbolic representations. I think they’re wrong. And this paper is about how to do one aspect of the symbolic stuff that people thought couldn’t be done easily with neural networks. In the paper, I discuss how islands of identical vectors can be used to represent the part-whole hierarchy.

AB: Is the objective here a more accurate system?

GH: Hopefully it makes for more accurate systems, but also more interpretable systems that work more like the way people do. If you can get GLOM to work, you may be able to get rid of things like adversarial examples.  Adversarial examples show that traditional convolutional networks are currently doing it completely wrong. With GLOM, you should be able to recognize images based on the relationships between parts of an image, not on fine textures and things that aren’t discernible to the human eye.

AB: If GLOM works, how will it change computer vision systems?

GH: I think vision systems will behave more like people, which is particularly important for self-driving cars. But a system like this could be applied everywhere: from satellite imagery that can determine how agricultural land is being used, to an up-close analysis of a leaf that can determine the health of an individual plant. It will also be very useful for language, as this same system when applied to natural language models may make it much easier to interpret how the language system is understanding phrases and sentences.

AB: I notice that your references to GLOM are in all caps. What does GLOM mean and is there an acronym you haven’t shared?

GH: I landed on the term, originally, from the slang for agglomeration – the idea of things that “glom together.” But it might also stand for “Geoff’s Last Original Model.”

AB: Let’s hope not!

GH: We’ll see.

AI News This Week

  • Stanford HAI AI Index 2021  (Stanford)

    The annual AI Index report is one of the most comprehensive reports about artificial intelligence. The 2021 edition significantly expands the amount of data available in the report, which was drawn from a broader set of academic, private, and non-profit organizations for calibration. The report illustrates the effect of COVID-19 on AI development from multiple perspectives, including how AI helps with COVID-related drug discovery and the effect of the pandemic on hiring and private investment.

  • COVID-19 increased the use of AI. Here’s why it’s here to stay   (World Economic Forum)

    In response to the pandemic, organizations have fast-tracked their investments in technology, facilitating remote work, enhancing user and customer experiences, and decreasing costs. In terms of AI adoption, Appen’s State of AI 2020 Report suggests that 41% of companies have accelerated their AI strategies during COVID-19. Three-quarters of companies surveyed in the report cite AI as critical to their success in 2020, with retail, education, and healthcare sectors reporting the greatest benefits from these investments.

  • Facebook’s New AI Teaches Itself to See With Less Human Help  (Wired)

    Most image recognition algorithms require significant amounts of labeled pictures for training purposes. A new “self-supervised” approach to learning, however, may do away with labeling. While a similar approach has proven effective in translating text from one language to another, applying this methodology to images has proven more challenging. The research led by Facebook builds on steady progress in tweaking deep learning algorithms to make them more efficient and effective. There are several potential applications for this research, including the analysis of medical images without the need for labeled scans and x-rays.

  • An AI is training counselors to deal with teens in crisis  (MIT Technology Review)

    The Trevor Project, America’s hotline for LGBTQ youth, is turning to a natural language processing-powered chatbot to train their employees to help troubled teenagers. The role-playing model is trained on 45 million pages from the web, which teaches it the basic structure and grammar of the English language. The Trevor Project then fine-tuned the model using the transcripts of previous role-playing conversations. The organization believes that 1.8 million LGBTQ youth in America seriously consider suicide each year and their existing 600 counselors are overwhelmed by demand. The Trevor Project also uses a machine-learning algorithm to help determine which callers are at highest risk of danger.

  • Intel’s 3D and AI tech now helps train athletes  (VentureBeat)

    It’s a game of inches. For professional athletes, winning and losing is often determined by the smallest possible measurements. Intel’s 3D and AI technology captures skeletal data when an athlete is sprinting, using a video camera running at 60 frames per second. That data is then analyzed using the technology’s AI capabilities. The goal is to make it simpler for coaches and athletes to understand how different types of skeletal structures may give one athlete an edge over another. Armed with fresh insight, it may become possible to optimize conditioning regimens based on how the skeleton of a specific athlete is constructed.

Radical Reads is edited by Leah Morris (Senior Director, Velocity Program, Radical Ventures).