CUSO WS on Large Language Models
2025-09-03
dance | move | steal | like | nobody’s | watching |
---|---|---|---|---|---|
1 | 0 | 0 | 1 | 1 | 1 |
dance | move | steal | like | nobody’s | watching |
---|---|---|---|---|---|
1 | 0 | 0 | 1 | 1 | 1 |
0 | 1 | 0 | 1 | 1 | 1 |
dance | move | steal | like | nobody’s | watching |
---|---|---|---|---|---|
1 | 0 | 0 | 1 | 1 | 1 |
0 | 1 | 0 | 1 | 1 | 1 |
0 | 0 | 1 | 1 | 1 | 1 |
dance | move | steal | like | nobody’s | watching |
---|---|---|---|---|---|
1 | 0 | 0 | 1 | 1 | 1 |
0 | 1 | 0 | 1 | 1 | 1 |
0 | 0 | 1 | 1 | 1 | 1 |
0 | 0 | 1 | 0 | 1 | 1 |
Distributional Hypothesis
A word’s meaning can be inferred from the context it appears in.
Firth (1957): “know a word by the company it keeps”
word2vec
(skip-gram)Mikolov et al. (2013)
Source: Math Insight
Magnitude: length of a vector
\[||\mathbf{a}|| = \sqrt{a_1^2 + a_2^2 + \ldots + a_n^2}\]
\[\mathbf{a} + \mathbf{b} = a_1 + b_1, a_2 + b_2 + \ldots, a_n + b_n\]
\[ \mathbf{a} + \mathbf{b} = [1,3] + [3,2] = [4,5] \]
\(WV_{Paris} - WV_{France} + WV_{Germany} = WV_{X}\)
Neat interactive visualization: Math Insight
\[\mathbf{a} \cdot \mathbf{b} = \sum_{i=1}^{n} a_i b_i = a_1 b_1 + a_2 b_2 + \ldots + a_n b_n\]
\[\mathbf{a} \cdot \mathbf{b} = 1 * 3 + 3 * 2 = 3 + 6 = 9\]
(to get magnitude of projection)
\[\frac{9}{||\mathbf{b}||} = \frac{9}{\sqrt{3^2 + 2^2}} = \frac{9}{\sqrt{13}} \approx 2.5\]
\[\cos(\theta) = \frac{\mathbf{a} \cdot \mathbf{b}}{|\mathbf{a}| |\mathbf{b}|} \]
Seems familiar? Remember magnitude of the projection:
\[\frac{\mathbf{a} \cdot \mathbf{b}}{|\mathbf{b}|} \]
Difference: normalization by magnitude of both vectors.
\[\cos(\theta) = \frac{\mathbf{a} \cdot \mathbf{b}}{|\mathbf{a}| |\mathbf{b}|} \]
Neat feature: \(\theta\) is the angle between our two vectors!
Nice feature of all this algebra: you can define semantic axes simply by substracting vectors of polar opposites!
\[WV_{frenchness} = WV_{french} - WV_{german}\]
The projection of a term on this vector tells us which pole it is more associated with!
Term | Frenchness |
---|---|
Paris | 0.298 |
Lausanne | 0.073 |
Bern | -0.177 |
Berlin | -0.333 |
“The paragraph token can be thought of as another word.” (p.3)
Figure 2 from Le and Mikolov (2014)
\(\rightarrow\) Validation is essential!