Is AI Creative?

During Toronto Tech Week, Nobel laureate Geoffrey Hinton and Nick Frosst, Co-Founder of Radical portfolio company Cohere, engaged in a captivating discussion at the University of Toronto’s Convocation Hall. The conversation, moderated by CBC tech journalist Nora Young, explored fundamental questions about AI understanding, creativity, and the future of human-computer interaction. The pair continued their dialogue later that evening at Radical’s AI Founders Toronto Tech Week Mixer, where they further debated AI’s creative potential and its wide-reaching impact on daily life.

The following excerpts from their University of Toronto conversation have been edited for clarity.

Nora: From a research point of view, what are the most interesting questions and stumbling blocks AI development is facing right now?

Nick: There are lots of blockers preventing this technology from being as impactful as it can be. And I think a lot of the blockers right now have nothing to do with AI itself. There are blockers like privacy, deployments, and what data it’s plugged into. Even if the technology did not advance beyond where it is today, it could still have a far greater impact, because there’s basic computer science work that has to get done.

Geoff: AI is already remarkably good at reasoning compared with what people expected some years ago. There’s been a long train of people who believed in classical linguistics, saying this will never go any further. And every year they say it’ll never go any further, and every year it gets a bit better.

Nora: There seems to be a divergence between you two on the topic of creativity in AI systems…

Geoff: I have an intuition that they’re very creative, that they’re seeing all sorts of analogies that people have never seen. You have about 100 trillion connections in your brain. GPT-4 has about a trillion. GPT-4 knows a lot more than you do, so it’s packed a lot more knowledge into a lot fewer connections. To do that, it has to see relationships between all sorts of different pieces of knowledge to compress it into so few connections.

Nick: I think we are on two different sides of a spectrum. People on my side mostly think the knowledge from an LLM comes from the base model, and then you bring that knowledge out with reinforcement learning from human feedback. People on your side think that knowledge is gained through reinforcement learning from human feedback or from some other environment.

Geoff: In all the tests people have done comparing the creativity of an LLM with the creativity of a person, LLMs actually score quite well. I think they’re actually very creative because they’re using the training data to see all sorts of relationships, and now they can be very creative. You think they’re mimicking what’s in the training data; I think they’ve used the training data to see relationships and can now be creative.

Nora: What are you feeling really hopeful about regarding the future of large language models and where we’re headed?

Geoff: I’m feeling really hopeful about healthcare. I think we’re going to get much better healthcare, and for old people, that’s really important. What’s nice is you’re not going to put anybody out of work. If you make nurses and doctors 10 times as efficient, we just get 10 times as much healthcare. That would be great.

Nick: Yeah, I think healthcare is a great example, but I’m optimistic about this technology across the board. I want to do less boring work, I want to write less documentation, I want to fill out fewer forms, I want to write fewer emails. I want to sit around and write poetry. I want to pontificate with my friends. I think LLMs stand to lift the burden of that work from people and allow them to do the things that they are particularly good at, and those things happen to be much more enjoyable. So I really think this can transform productivity. I think this can be good for all of our individual lives by allowing us to do the things that we like more.

Geoff: I think the LLMs will be able to write your poetry for you too, and they’ll be able to make every word in the poem start with B, and you can’t do that.

Nick: Actually, as somebody who does write and who has a large language modelling company, I don’t use our model for writing lyrics. And that’s not because it wouldn’t write better lyrics. That’s because I’m not trying to write lyrics faster, because I’m not actually interested in the efficiency of self-expression. I’m just interested in self-expression.

AI News This Week

The A.I. Frenzy Is Escalating. Again. (The New York Times)

Hyperscalers are dramatically accelerating AI spending two and a half years after ChatGPT’s launch, with no signs of slowing down. Meta, Microsoft, Amazon, and Google plan to spend a combined $320 billion on infrastructure this year, doubling their spending from two years ago. Radical Managing Partner Jordan Jacobs explains, “the thinking from the big C.E.O.s is that they can’t afford to be wrong by doing too little, but they can afford to be wrong by doing too much.” The industry is shifting focus from general AI systems dominated by established players toward specialized applications from startups like Radical portfolio company Ribbon, which has developed an AI job interviewer.
Anthropic Wins Ruling on AI Training in Copyright Lawsuit but Must Face Trial on Pirated Books (The Washington Post)

A federal judge ruled that Anthropic’s training of Claude on copyrighted books constitutes “fair use” under copyright law, finding the process “quintessentially transformative” because the AI learns from works to create something different rather than replicate them. However, U.S. District Judge William Alsup ruled that Anthropic must still face trial in December over allegations it illegally acquired training materials from online “shadow libraries” of pirated books. This ruling could establish important precedent for similar cases against other AI companies facing copyright lawsuits over their training practices.
The Struggle to Get Inside How AI Models Really Work (Financial Times)

Frontier AI labs are struggling to understand how their reasoning models actually operate. While “chain-of-thought” techniques allow AI systems to show their step-by-step problem-solving process, researchers are discovering concerning “misbehaviour” where models provide final answers that contradict their displayed reasoning. To address this, researchers are developing methods to detect misbehaviour by analyzing full chain-of-thought processes, using these insights to train models with better responses, and working to preserve authentic reasoning states rather than optimized “nice-looking thoughts” that could hide problematic behaviours.
Can we fix AI’s evaluation crisis? (MIT Technology Review)

AI evaluation faces a crisis as traditional benchmarks become unreliable measures of model capabilities. Companies optimize models to score well on tests rather than genuinely improve, while data contamination means models may have seen benchmark questions during training. Many popular benchmarks are now maxed out at 90%+ accuracy. Researchers are developing solutions, including LiveCodeBench Pro, which uses algorithmic olympiad problems where top models achieve only 53% on medium difficulty, and dynamic benchmarks like LiveBench that evolve quarterly. New approaches focus on real-world utility, risk assessment, and practical applications rather than just technical performance scores.
Research: Future of Work with AI Agents (Stanford University)

Researchers developed WORKBank, a database analyzing worker desires and AI expert assessments across 844 tasks and 104 occupations, introducing the Human Agency Scale to quantify preferred human involvement. Key findings reveal workers support automation for 46.1% of tasks, primarily low-value and repetitive. The study identifies AI deployment zones based on worker desire versus technical capability. Workers prefer collaborative partnerships (45.2% favour equal human-agent roles), while skills shift from information processing toward interpersonal competencies, signalling how AI integration may reshape core workplace abilities.

Radical Reads is edited by Ebin Tomy.

Radical Reads

AI News This Week

The A.I. Frenzy Is Escalating. Again. (The New York Times)

Anthropic Wins Ruling on AI Training in Copyright Lawsuit but Must Face Trial on Pirated Books (The Washington Post)

The Struggle to Get Inside How AI Models Really Work (Financial Times)

Can we fix AI’s evaluation crisis? (MIT Technology Review)

Research: Future of Work with AI Agents (Stanford University)