Session 5: Generative LLMs

CUSO WS on Large Language Models

Nicolai Berk

2025-09-03

Recap: Encoder Models (BERT)

Input: “The [MASK] is barking loudly”
BERT: Processes entire sentence simultaneously
Output: probability distribution across tokens (“dog” (87%), “puppy” (8%), “animal” (3%))
Bidirectional: Reads text in both directions

Masked Language Modeling: Trained to fill in blanks

Recap: Downstream Encoder Tasks

Classify documents
Extract information
Measure similarity

How a Decoder Works

Source: Tunstall, Von Werra, and Wolf (2022)

How a Decoder Works

Source: Tunstall, Von Werra, and Wolf (2022)

Input: “Cause and…”
Predicts next token: “effect”
Repeats until stopping condition
Autoregressive generation

“Causal” language modeling

Major differences

Autoregressive: Generates one word at a time
“Causal” attention: Only looks at previous words
Massive scale: Often billions of parameters

What does this look like?

Visualization

Take 5-10 minutes to explore the visualization and discuss with your neighbor how the decoder architecture works.

Training GPT

Pretraining: predict the next word (causal LM objective)
Scale = data + parameters + compute
Fine-tuning:
- Instruction tuning (datasets of Q&A)
- RLHF¹ (aligning with human preferences)

Decoder Tasks

Content generation

Surprisingly generalizable task!

Zero/few-shot Classification
Code generation
Translation
App development

…many more things it was not trained to do!

Inference with LLMs

Prompting

Remember: prompt engineering on the validation set!

Writing a good prompt

Persona
Task
Context
Format

You are a program manager in [industry]. Draft an executive summary email to [persona] based on [details about relevant program docs]. Limit to bullet points.

Controlling model output

`pydantic`

Let’s you impose structure on model outputs.

class CityLocation(BaseModel):
    city: str
    country: str


agent = Agent('google-gla:gemini-1.5-flash', output_type=CityLocation)
result = agent.run_sync('Where were the olympics held in 2012?')
print(result.output)
#> city='London' country='United Kingdom'

Labelling with LLMs

Zero-shot: just prompt provided
Few-shot: a few examples provided
Dynamic few-shot: examples selected based on similarity to the input

Few-shot labelling example

Your task is to analyze the sentiment in the TEXT below from an investor perspective and label it with only one the three labels:
positive, negative, or neutral.

Examples:
Text: Operating profit increased, from EUR 7m to 9m compared to the previous reporting period.
Label: positive
Text: The company generated net sales of 11.3 million euro this year.
Label: neutral
Text: Profit before taxes decreased to EUR 14m, compared to EUR 19m in the previous period. 
Label: negative

Dynamic few-shot labelling

Idea: most similar examples should be most informative

Use cosine_similarity of embedding to assess similarity
Add k most similar examples to the prompt

Retrieval Augmented Generation (RAG)

Retrieve most likely examples given a query (e.g. context for question)

Provide examples to model as context for answer generation
Can require use of an additions

Synthetic Annotation

Source: Moritz Laurer on HF Blog

Synthetic Annotation

Use LLM to annotate training data
Generate synthetic labels
Train smaller encoder model on synthetic data
Evaluate on gold standard
Apply cost-efficient at scale

Zero-shot encoder models

Laurer et al. (2024)

Task: natural-language inference (NLI) - universal
Allow prompting
Controlled output
Class probabilities
Efficient

Try this before using generative models

How does it work?

Laurer et al. (2024), Table 1

How does it work?

Class-hypotheses: “It is about economy”, “it is about democracy”, …
E.g. “We need to raise tariffs” as context
Test each of the class-hypotheses against this context
Probabilities for entailment and contradiction are converted to label probabilities

Tutorial I

LLM inference and prompting

Notebook

Hosting Models & Calling APIs

Local Hosting

Ollama

OpenAI

Tutorial II

API calls, Structured Output

Notebook

Resources

Laurer, Moritz, Wouter Van Atteveldt, Andreu Casas, and Kasper Welbers. 2024. “Less Annotating, More Classifying: Addressing the Data Scarcity Issue of Supervised Machine Learning with Deep Transfer Learning and Bert-Nli.” Political Analysis 32 (1): 84–100.

Tunstall, Lewis, Leandro Von Werra, and Thomas Wolf. 2022. Natural Language Processing with Transformers. " O’Reilly Media, Inc.".

Session 5: Generative LLMs

Recap: Encoder Models (BERT)

Masked Language Modeling: Trained to fill in blanks

Recap: Downstream Encoder Tasks

How a Decoder Works

How a Decoder Works

“Causal” language modeling

Major differences

What does this look like?

Visualization

Training GPT

Decoder Tasks

Surprisingly generalizable task!

Inference with LLMs

Prompting

Remember: prompt engineering on the validation set!

Writing a good prompt

Controlling model output

`pydantic`

Labelling with LLMs

Few-shot labelling example

Dynamic few-shot labelling

Retrieval Augmented Generation (RAG)

Synthetic Annotation

Synthetic Annotation

Zero-shot encoder models

Laurer et al. (2024)

Try this before using generative models

How does it work?

How does it work?

Tutorial I

Hosting Models & Calling APIs

HF Inference Endpoints

Local Hosting

Azure

OpenAI

Tutorial II

Resources

Session 5: Generative LLMs

Recap: Encoder Models (BERT)

Masked Language Modeling: Trained to fill in blanks

Recap: Downstream Encoder Tasks

How a Decoder Works

How a Decoder Works

“Causal” language modeling

Major differences

What does this look like?

Visualization

Training GPT

Decoder Tasks

Surprisingly generalizable task!

Social Science applications

Inference with LLMs

Prompting

Remember: prompt engineering on the validation set!

Writing a good prompt

Controlling model output

pydantic

Labelling with LLMs

Few-shot labelling example

Dynamic few-shot labelling

Retrieval Augmented Generation (RAG)

Synthetic Annotation

Synthetic Annotation

Zero-shot encoder models

Laurer et al. (2024)

Try this before using generative models

How does it work?

How does it work?

Tutorial I

Hosting Models & Calling APIs

HF Inference Endpoints

Local Hosting

Azure

OpenAI

Tutorial II

Resources

`pydantic`