review | label |
---|---|
great movie! | ? |
what a bunch of cr*p | ? |
I lost all faith in humanity after watching this | ? |
Session 3: Intro to Supervised Machine Learning
2025-09-03
Netflix recommendations
Drug Development (Source: Catacutan et al. 2024)
ChatGPT
We know stuff about some documents and want to know the same stuff about other documents.
Term | Meaning |
---|---|
Classifier | a statistical model fitted to some data to make predictions about different data. |
Training | The process of fitting the classifier to the data. |
Train and test set | Datasets used to train and evaluate the classifier. |
Vectorizer | A tool used to translate text into numbers. |
Statistical models can only read numbers
\(\rightarrow\) we need to translate!
ID | Text |
---|---|
1 | This is a text |
2 | This is no text |
ID | This | is | a | text | no |
---|---|---|---|---|---|
1 | 1 | 1 | 1 | 1 | 0 |
2 | 1 | 1 | 0 | 1 | 1 |
review | label |
---|---|
great movie! | ? |
what a bunch of cr*p | ? |
I lost all faith in humanity after watching this | ? |
FALSE | TRUE | |
---|---|---|
FALSE | 688 | 9 |
TRUE | 37 | 266 |
Term | Meaning |
---|---|
Accuracy | How much does it get right overall? |
Recall | How much of the relevant cases does it find? |
Precision | How many of the found cases are relevant? |
F1 Score | Weighted average of precision and recall. |
\[MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2\]
Source: IBM
\[\hat{y} = g(w_0 + \sum_{i=1}^{m} w_i x_i)\]
Source: IBM
Important: non-linear! (why do you think that is?)
Who gets the best F1 score?