OpenAI was the first to apply generative pre-training to the transformer architecture, introducing the GPT-1 model in 2018.[6] The company has since released many bigger GPT models. The popular chatbotChatGPT, released in late 2022, was followed by many competitor chatbots using their own "GPT" models to generate text, such as Gemini, DeepSeek or Claude.[7]
GPTs are primarily used to generate text, but can be trained to generate other kinds of data. For example, GPT-4o can process and generate text, images and audio.[8] To improve performance on complex tasks, some GPTs, such as OpenAI o3, spend more time analyzing the problem before generating an output, and are called reasoning models.
Background
The core technology of a GPT is the transformer architecture. Developed and introduced by Google researchers in the 2017 paper Attention Is All You Need, the transformer architecture solved many of the performance issues associated with older recurrent neural network (RNN) designs for natural language processing (NLP).[9] The architecture's use of an attention mechanism allowed models to process entire sequences of text at once, enabling the training of much larger and more sophisticated models.
Separately, the concept of generative pre-training (GP) was a long-established technique in machine learning. GP is a form of self-supervised learning where a model is first trained on a large, unlabeled dataset (the "pre-training" step) to learn to generate data points. This pre-trained model is then adapted to a specific task using a labeled dataset (the "fine-tuning" step).[10]
History
In June 2018, OpenAI published the paper Improving Language Understanding by Generative Pre-Training, which introduced the first generative pre-trained transformer model, GPT-1.[11] This model combined the transformer architecture with generative pre-training, allowing it to be trained on large bodies of text (the BookCorpus) and then fine-tuned for a variety of specific language tasks. This semi-supervised approach was a breakthrough, as it reduced the need for large, manually-labeled datasets, which were expensive and time-consuming to create.[11]
OpenAI followed this with GPT-2 in 2019, a much larger model trained on a 40 GB dataset called WebText. Citing risks of malicious use, OpenAI initially opted for a "staged release", publishing smaller versions of the model before releasing the full 1.5-billion parameter model in November 2019.[12] In 2020, GPT-3 was released with 175 billion parameters, trained on an even larger dataset. GPT-3 marked a significant leap in capability, demonstrating few-shot and zero-shot learning abilities where the model could perform tasks it was not explicitly trained for.[13]
OpenAI started using reinforcement learning from human feedback (RLHF) to better align the models' behavior with human preferences. This led to the development of "InstructGPT", a fine-tuned version of GPT-3, and ultimately the public release of the ChatGPT chatbot in November 2022.[14] The immense popularity of ChatGPT spurred widespread development of competing GPT-based systems from other organizations. EleutherAI released a series of open-source models, including GPT-J in 2021.[15] Other major technology companies developed their own large language models, including Google's PaLM and Meta AI's LLaMA.
Many subsequent GPT models have been trained to be multimodal (able to process or generate multiple types of data). For example, GPT-4o can both process and generate text, images and audio.[16] Additionally, GPT models like o3 or DeepSeek R1 have been trained with reinforcement learning to generate multi-step chain-of-thought reasoning before producing a final answer, which helps solve complex problems, for example in mathematics.[17]
Foundation models
A foundation model is an AI model trained on broad data at scale such that it can be adapted to a wide range of downstream tasks.[18][19]
Thus far, the most notable GPT foundation models have been from OpenAI's GPT-n series. The most recent from that is GPT-4, for which OpenAI declined to publish the size or training details (citing "the competitive landscape and the safety implications of large-scale models").[20]
Other such models include Google's PaLM, a broad foundation model that has been compared to GPT-3 and has been made available to developers via an API,[27][28] and Together's GPT-JT, which has been reported as the closest-performing open-source alternative to GPT-3 (and is derived from earlier open-source GPTs).[29]Meta AI (formerly Facebook) also has a generative transformer-based foundational large language model, known as LLaMA.[30]
Foundational GPTs can also employ modalities other than text, for input and/or output. GPT-4 is a multi-modal LLM that is capable of processing text and image input (though its output is limited to text).[31] Regarding multimodal output, some generative transformer-based models are used for text-to-image technologies such as diffusion[32] and parallel decoding.[33] Such kinds of models can serve as visual foundation models (VFMs) for developing downstream systems that can work with images.[34]
Task-specific models
Training workflow of original ChatGPT/InstructGPT release[35][36]
A foundational GPT model can be further adapted to produce more targeted systems directed to specific tasks and/or subject-matter domains. Methods for such adaptation can include additional fine-tuning (beyond that done for the foundation model) as well as certain forms of prompt engineering.[37]
An important example of this is fine-tuning models to follow instructions, which is of course a fairly broad task but more targeted than a foundation model. In January 2022, OpenAI introduced "InstructGPT"—a series of models which were fine-tuned to follow instructions using a combination of supervised training and reinforcement learning from human feedback (RLHF) on base GPT-3 language models.[38][39] Advantages this had over the bare foundational models included higher accuracy, less negative/toxic sentiment, and generally better alignment with user needs. Hence, OpenAI began using this as the basis for its API service offerings.[40] Other instruction-tuned models have been released by others, including a fully open version.[41][42]
Another (related) kind of task-specific models are chatbots, which engage in human-like conversation. In November 2022, OpenAI launched ChatGPT—an online chat interface powered by an instruction-tuned language model trained in a similar fashion to InstructGPT.[43] They trained this model using RLHF, with human AI trainers providing conversations in which they played both the user and the AI, and mixed this new dialogue dataset with the InstructGPT dataset for a conversational format suitable for a chatbot. Other major chatbots currently include Microsoft's Bing Chat, which uses OpenAI's GPT-4 (as part of a broader close collaboration between OpenAI and Microsoft),[44] and Google's competing chatbot Gemini (initially based on their LaMDA family of conversation-trained language models, with plans to switch to PaLM).[45]
Yet another kind of task that a GPT can be used for is the meta-task of generating its own instructions, like developing a series of prompts for 'itself' to be able to effectuate a more general goal given by a human user.[46] This is known as an AI agent, and more specifically a recursive one because it uses results from its previous self-instructions to help it form its subsequent prompts; the first major example of this was Auto-GPT (which uses OpenAI's GPT models), and others have since been developed as well.[47]
Domain-specificity
GPT systems can be directed toward particular fields or domains. Some reported examples of such models and apps are as follows:
EinsteinGPT – for sales and marketing domains, to aid with customer relationship management (uses GPT-3.5)[48][49]
BloombergGPT – for the financial domain, to aid with financial news and information (uses "freely available" AI methods, combined with their proprietary data)[50]
Khanmigo – described as a GPT version for tutoring, in the education domain, it aids students using Khan Academy by guiding them through their studies without directly providing answers (powered by GPT-4)[51][52]
SlackGPT – for the Slack instant-messaging service, to aid with navigating and summarizing discussions on it (uses OpenAI's API)[53]
BioGPT – for the biomedical domain, to aid with biomedical literature text generation and mining (uses GPT-2)[54]
Sometimes domain-specificity is accomplished via software plug-ins or add-ons. For example, several different companies have developed particular plugins that interact directly with OpenAI's ChatGPT interface,[55][56] and Google Workspace has available add-ons such as "GPT for Sheets and Docs"—which is reported to aid use of spreadsheet functionality in Google Sheets.[57][58]
Brand issues
OpenAI, which created the first generative pre-trained transformer (GPT) in 2018, asserted in 2023 that "GPT" should be regarded as a brand of OpenAI.[59] In April 2023, OpenAI revised the brand guidelines in its terms of service to indicate that other businesses using its API to run their AI services would no longer be able to include "GPT" in such names or branding.[60] In May 2023, OpenAI engaged a brand management service to notify its API customers of this policy, although these notifications stopped short of making overt legal claims (such as allegations of trademark infringement or demands to cease and desist).[59] As of November 2023, OpenAI still prohibits its API licensees from naming their own products with "GPT",[61] but it has begun enabling its ChatGPT Plus subscribers to make "custom versions of ChatGPT" called GPTs on the OpenAI site.[62] OpenAI's terms of service says that its subscribers may use "GPT" in the names of these, although it's "discouraged".[61]
Relatedly, OpenAI has applied to the United States Patent and Trademark Office (USPTO) to seek domestic trademark registration for the term "GPT" in the field of AI.[59] OpenAI sought to expedite handling of its application, but the USPTO declined that request in April 2023.[63] In May 2023, the USPTO responded to the application with a determination that "GPT" was both descriptive and generic.[64] As of November 2023, OpenAI continues to pursue its argument through the available processes. Regardless, failure to obtain a registered U.S. trademark does not preclude some level of common-law trademark rights in the U.S.[65] and trademark rights in other countries.[66]
For any given type or scope of trademark protection in the U.S., OpenAI would need to establish that the term is actually "distinctive" to their specific offerings in addition to being a broader technical term for the kind of technology. Some media reports suggested in 2023 that OpenAI may be able to obtain trademark registration based indirectly on the fame of its GPT-based chatbot product, ChatGPT,[63][67] for which OpenAI has separately sought protection (and which it has sought to enforce more strongly).[68] Other reports have indicated that registration for the bare term "GPT" seems unlikely to be granted,[59][69] as it is used frequently as a common term to refer simply to AI systems that involve generative pre-trained transformers.[3][70][71][72] In any event, to whatever extent exclusive rights in the term may occur the U.S., others would need to avoid using it for similar products or services in ways likely to cause confusion.[69][73] If such rights ever became broad enough to implicate other well-established uses in the field, the trademark doctrine of descriptive fair use could still continue non-brand-related usage.[74]
Selected bibliography
This section lists the main official publications from OpenAI and Microsoft on their GPT models.
^Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N; Kaiser, Łukasz; Polosukhin, Illia (2017). "Attention is All you Need"(PDF). Advances in Neural Information Processing Systems. 30. Curran Associates, Inc. Archived(PDF) from the original on February 21, 2024. Retrieved January 28, 2024.
^Erhan, Dumitru; Courville, Aaron; Bengio, Yoshua; Vincent, Pascal (March 31, 2010). "Why Does Unsupervised Pre-training Help Deep Learning?". Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings: 201–208. Archived from the original on January 24, 2024. Retrieved January 24, 2024.
^ abOpenAI (2023). "GPT-4 Technical Report"(PDF). Archived(PDF) from the original on March 14, 2023. Retrieved March 16, 2023. Cite error: The named reference "gpt4-report" was defined multiple times with different content (see the help page).
^ abOuyang, Long; Wu, Jeff; Jiang, Xu; et al. (November 4, 2022). "Training language models to follow instructions with human feedback". NeurIPS. arXiv:2203.02155.