In spite of its overnight success, OpenAI’s breakthrough hit is the result of decades of research. Our writers help you understand the future of technology by untangling the complex, messy world of technology. More information can be found here.
ChatGPT has reached its peak. OpenAI’s chatbot exploded into the mainstream almost overnight after it was released as a web app in December. In just two months after launching, it reached 100 million users, making it the fastest-growing internet service ever. The technology is now integrated into Microsoft’s Office software and Bing search engine thanks to a $10 billion deal between OpenAI and Microsoft. Its onetime rival Google is spurring the company into action by rolling out LaMDA, its own chatbot. ChatGPT is even a part of my family’s WhatsApp.
In spite of this, OpenAI’s breakthrough didn’t happen by accident. There have been several large language models over the years, but this is the most refined iteration yet. Our journey has led us to this point.
1980s–’90s: Recurrent Neural Networks
GPT-3 is also developed by OpenAI, and ChatGPT is a version of it. In neural networks, language models are trained using large amounts of text. A neural network is a type of software inspired by the way neurons in animal brains communicate. Language models require a neural network that is able to make sense of sequences of letters and words of varying lengths. They are slow to train and can forget prior words in a sequence. Recurrent neural networks were developed in the 1980s and can handle sequences of words.
A recurrent neural network named LSTM (Long Short-Term Memory) was invented by Sepp Hochreiter and Jürgen Schmidhuber in 1997 to solve this problem. The LSTM networks employ special components that allow past data to be retained for longer in an input sequence. It was possible to use LSTMs to handle strings of text several hundred words long, but they had limited abilities in terms of language.
2017: Transformers
Today’s large language models are based on transformers, a kind of neural network invented by Google researchers that can track where words appear in a sequence. It is common for words to have different meanings depending on words before and after them. Transformers are better able to handle longer strings of text by tracking contextual information. When referring to “hot dogs,” there is a difference between “give me plenty of water” and “eat them with mustard.”
2018–2019: GPT and GPT-2
It only took OpenAI a few months to develop its first two large language models. A large language model is viewed as a key step towards developing general-purpose, multi-skilled AI by the company. At the time, GPT (Generative Pre-trained Transformer) set a benchmark for natural-language processing that beat the state-of-the-art.
A method for training machine-learning models based on unsupervised data (in this instance, lots of text) that has never been annotated was used by GPT to combine transformers and unsupervised learning. Without being told what it’s looking at, the software is able to discover patterns in the data on its own. The size of the data sets that can be used for training is limited by the difficulty of labeling data by hand, which made supervised learning popular in the past.
However, GPT-2 generated more buzz. As a result of OpenAI’s concern about people using GPT-2 “to generate misleading, biased, or abusive language,” the full model won’t be released. Time has changed so much.
2020: GPT-3
OpenAI’s GPT-3, released just a few weeks after GPT-2, was the biggest innovation since GPT-2. In terms of text generation, it was able to generate human-like text. As well as answering questions, summarizing documents, generating stories in a variety of styles, and translating between English, French, Spanish, and Japanese, GPT-3 is capable of many other things. There is something uncanny about its mimicry.
Rather than inventing new techniques, GPT-3’s improvements were the result of supersizing existing ones. Compared to GPT-2’s 1.5 billion parameters, GPT-3 has 175 billion (the values that are adjusted during training). A much larger amount of data was also used in training it.
However, training with internet text poses new challenges. In order to produce the disinformation and prejudice it found online, GPT-3 absorbed much of it and reproduced it as needed. A model trained on the internet has an internet-scale bias, according to OpenAI.
December 2020: Toxic text and other problems
The rest of the tech world struggled to curb toxic tendencies in AI even as OpenAI grappled with GPT-3’s biases.
Leave a Comment