Radical Blog

Thanks to Large Language Models, computers understand language better than ever

By Aidan Gomez, CEO and Co-Founder of Cohere and Jay Alammar, Partner Engineer, Cohere

http://Thanks%20to%20Large%20Language%20Models,%20computers%20understand%20language%20better%20than%20ever

Image Source: Cohere

Editor’s note:

Large language models (LLMs) are predicted to play an ever larger role for businesses across every industry in 2022. With that in mind, we invited Aidan Gomez, the CEO and co-founder of Radical portfolio company Cohere and a co-inventor of Transformers, and Jay Alammar, Partner Engineer at Cohere, to provide an introduction to LLMs. Cohere lets any business leverage world-leading natural language processing technology through a simple API.

Language is important. It’s how we learn about the world (e.g. news, searching the web or Wikipedia), and also how we shape it (e.g. agreements, laws, or messages). Language is also how we connect and communicate — as people, and as groups and companies.

Despite the rapid evolution of software, computers remain limited in their ability to deal with language. Software is great at searching for exact matches in text, but often fails at more advanced uses of language — ones that humans employ on a daily basis. There’s a clear need for more intelligent tools that better understand language.

A recent breakthrough in artificial intelligence is the introduction of language processing technologies using Transformers, that enable us to build more intelligent systems with a richer understanding of language than ever before. Large pretrained language models vastly extend the capabilities of what systems are able to do with text.

Here are a few examples of language understanding systems that can be built on top of large language models.

Large language models present a breakthrough in text generation. For the first time in history, we have software programs that can write text that feels like it’s written by humans. These capabilities open doors to use cases like summarization or paraphrasing.

Language models can be instructed to generate useful summaries or paraphrases of input text by guiding them using a task description in the prompt.

Classification is one of the most common use cases in language processing. Building systems on top of language models can automate language-based tasks and save time and effort. Developers can build classifiers on top of Cohere’s language models. These classifiers can automate language tasks and workflows.

Think of how many repeated questions have to be answered by a customer service agent every day. Language models are capable of judging text similarity and determining if an incoming question is similar to questions already answered in the FAQ section. There are multiple things a system can do once it receives the similarity scores — one possible next action is to simply show the answer to the most similar question (if above a certain similarity threshold). Another possible next action is to make that suggestion to a customer service agent.

Unlocking NLP

Adding language models to empower Google Search was noted as “representing the biggest leap forward in the past five years, and one of the biggest leaps forward in the history of Search.” Microsoft also uses such models for every query in the Bing search engine.

Despite the utility of these models, training and deploying them effectively is resource-intensive in its requirements of data, compute, and engineering resources.

Cohere offers an API to add cutting-edge language processing to any system. Cohere trains massive language models and puts them behind a simple API. Moreover, through finetuning, users can create massive models customized to their use case and trained on their data. This way, Cohere handles the complexities of collecting massive amounts of text data, improving ever-evolving neural network architectures, distributed training, serving models around the clock, and hiding all that complexity behind a simple API available to everyone, including for companies without the internal technical teams and resources or computing power of Google or Microsoft.