Foundations of AI and ML

Content

Lisam

Course materials

/

Section

Large Pre-trained Language Models

Large Pre-trained Language Models

In 2017, a team of researchers at Google published the Transformer architecture, a deep neural network architecture originally designed for machine translation. In the years since then, we have seen an avalanche of work on models based on this architecture. These models now define the state of the art in many application areas. In this lecture you will get an overview of one important family of Transformer-based models, the models from the GPT family.

Lecture slides are available here.

N-Gram Models

Before the advent of neural NLP, the state of the art in language modelling was defined by n-gram models. For example, a 2-gram model approximates the probability of the next word $w_{t+1}$ by breaking it down into a product of atomic probabilities of the form $P(w_i|w_{i-1})$:

$P(w_{t+1}|w_1, \dots, w_t) \sim \prod_{i=1}^t P(w_{i+1}|w_i)$

Each atomic probability quantifies the likelihood of the next word $w_{i+1}$ in the sequence, given only the immediately preceding word $w_i$.

One problem with $n$-gram models was to get useful estimates of their atomic probabilities. Assuming a vocabulary of 100,000 words, how many atomic probabilities have to be estimated in a 3-gram model, which are conditioned on the two immediately preceding words?

Fine-tuning GPT-3

GPT-3 maps texts to vectors of length 12288. Fine-tuning a pre-trained GPT-3 model on a classification task requires us to add a final linear layer that transforms these representations to $k$-dimensional vectors, where $k$ is the number of classes in the task at hand.

Consider a sentiment classification task with three classes: positive, negative and neutral. How many parameters do we need to train when fine-tuning a pre-trained GPT-3 model to this task?

GPT-2

This article explains why OpenAI chose to not immediately release their language model GPT-2. One of the reasons was …

This webpage contains the course materials for the course TDDE56 Foundations of AI and Machine Learning.
The content is licensed under Creative Commons Attribution 4.0 International.
Copyright © 2022 Linköping University