01Input space & score distribution

Predictions @ threshold · top: input space · bottom: score distribution

02Outcomes

Confusion matrix
Pred +
Pred −
Actual +
TP0
FN0
Actual −
FP0
TN0
MetricDefinitionValue
TPR (Recall)how well the classifier "recalls" all actual positives$\text{TP}/(\text{TP}+\text{FN})$
FPRhow often the classifier raises a false alarm on negatives$\text{FP}/(\text{FP}+\text{TN})$
Precisionhow trustworthy a positive prediction is$\text{TP}/(\text{TP}+\text{FP})$
F1harmonic mean of precision & recall$2\cdot\text{Prec}\cdot\text{Rec}/(\text{Prec}+\text{Rec})$
Specificityhow well the classifier dismisses true negatives$\text{TN}/(\text{TN}+\text{FP}) = 1-\text{FPR}$
Accuracyoverall fraction of correct predictions$(\text{TP}+\text{TN})/N$
Error Rateoverall fraction of incorrect predictions$(\text{FP}+\text{FN})/N = 1-\text{Acc}$

03Threshold sweep

Precision · Recall · F1 vs threshold
ROC curve
Precision–Recall curve
Why precision and recall trade off

Both share TP in the numerator, but their denominators pull in opposite directions as you move the threshold.

Lower the threshold → more positives predicted → recall rises, precision falls (more false alarms).

Raise the threshold → fewer positives predicted → precision rises, recall falls (more misses).

Predict positive on everything and recall is perfect — but precision collapses. Predict positive only when certain and precision is high — but you miss many true positives. The F1 score and the PR curve make this trade-off explicit so you can pick the operating point that fits your cost structure.

04Youden's J — optimal threshold

Youden's J vs threshold
$J(\tau) = \mathrm{TPR}(\tau) + \mathrm{TNR}(\tau) - 1 = \mathrm{TPR}(\tau) - \mathrm{FPR}(\tau)$
Why Youden's J gives the optimal threshold

$J(\tau) = \mathrm{TPR}(\tau) - \mathrm{FPR}(\tau)$ measures how much better the classifier is than random at a given cut-off. A random classifier has $J = 0$; a perfect one has $J = 1$.

The Youden threshold $\tau^\star = \arg\max_\tau J(\tau)$ maximises the vertical distance between the ROC curve and the diagonal chance line — it is the point on the ROC curve furthest from the no-skill baseline.

This criterion implicitly weights sensitivity and specificity equally. If a false negative is much costlier than a false positive (e.g. cancer screening), you may prefer a lower threshold than $\tau^\star$ even though $J$ is slightly smaller there.