What are auto-regressive models?
Auto-regressive models are a class of generative models that predict the probability distribution of a sequence of tokens by conditioning each token’s probability distribution on the tokens that precede it in the sequence. Auto-regressive models are commonly used for tasks such as language modeling, machine translation, and image captioning.
What can auto-regressive models do?
Auto-regressive models generate new sequences of tokens by sampling from the predicted probability distributions conditioned on the preceding tokens in the sequence. For example, a language model trained with an auto-regressive architecture can generate coherent and diverse sentences by sampling from the predicted distribution of the next word given the preceding words in the sentence.
Some benefits of using auto-regressive models
Auto-regressive models offer several benefits for generative tasks:
Flexibility: Auto-regressive models can generate sequences of arbitrary length and are not restricted to fixed-length inputs or outputs.
Diversity: Auto-regressive models can generate diverse outputs by sampling from the predicted probability distributions, enabling the generation of multiple plausible outputs for a given input.
Adaptability: Auto-regressive models can be fine-tuned for specific tasks or domains, allowing for the generation of high-quality outputs that are tailored to a particular use case.
More resources to learn more about auto-regressive models
To learn more about auto-regressive models and their applications, you can explore the following resources:
The Illustrated Transformer, an interactive guide to the Transformer model, which is a popular auto-regressive architecture
Saturn Cloud for free cloud compute
OpenAI’s GPT-3 model, one of the largest and most powerful auto-regressive language models to date
The Attention Mechanism, a key component of many auto-regressive models that enables them to selectively focus on different parts of the input sequence
The Image Transformer, an auto-regressive model that generates image captions by predicting the next word in a sentence given the preceding words and the image features