We’ve reached peak ChatGPT. Released in December as a web app by the San Francisco–based firm OpenAI, the chatbot exploded into the mainstream almost overnight. According to some estimates, it is the fastest-growing internet service ever, reaching 100 million users in January, just two months after launch. Through OpenAI’s $10 billion deal with Microsoft, the tech is now being built into Office software and the Bing search engine. Stung into action by its newly awakened onetime rival in the battle for search, Google is fast-tracking the rollout of its own chatbot. Even my family WhatsApp is filled with ChatGPT chat.

But OpenAI’s breakout hit did not come out of nowhere. The chatbot is the most polished iteration to date in a line of large language models going back years. This is how we got here.

1980s–90s: Recurrent Neural Networks


ChatGPT is a version of GPT-3, a large language model also developed by OpenAI.  Language models are a type of neural network that has been trained on lots and lots of text. (Neural networks are software inspired by the way neurons in animal brains signal one another.) Because text is made up of sequences of letters and words of varying lengths, language models require a type of neural network that can make sense of that kind of data. Recurrent neural networks, invented in the 1980s, can handle sequences of words, but they are slow to train and can forget previous words in a sequence.

2017:  Transformers

 The breakthrough behind today’s generation of large language models came when a team of Google researchers invented transformers, a kind of neural network that can track where each word or phrase appears in a sequence. The meaning of words often depends on the meaning of other words that come before or after. By tracking this contextual information, transformers can handle longer strings of text and capture the meanings of words more accurately. For example, “hot dog” means very different things in the sentences “Hot dogs should be given plenty of water” and “Hot dogs should be eaten with mustard”.

2018–2019: GPT and GPT-2

OpenAI’s first two large language models came just a few months apart. The company wants to develop multi-skilled, general-purpose AI and believes that large language models are a key step toward that goal. GPT (short for Generative Pre-trained Transformer) planted a flag, beating state-of-the-art benchmarks for natural-language processing at the time. 

GPT combined transformers with unsupervised learning, a way to train machine-learning models on data (in this case, lots and lots of text) that hasn’t been annotated beforehand. This lets the software figure out patterns in the data by itself, without having to be told what it’s looking at. Many previous successes in machine-learning had relied on supervised learning and annotated data, but labeling data by hand is slow work and thus limits the size of the data sets available for training.  

But it was GPT-2 that created the bigger buzz. OpenAI claimed to be so concerned people would use GPT-2 “to generate deceptive, biased, or abusive language” that it would not be releasing the full model. How times change.




