Interactive · Binary Classification

ROC Curves Explained — by Seeing Them.

A live logistic classifier, sampled fresh in your browser. Every chart — decision boundary, confusion matrix, ROC, and precision–recall — updates the moment you move a slider. No equations until you've seen the geometry first.

Got a ROC curve from a real model? Drag the sliders until the shape matches — then look back at the scatter plot to see the decision boundary it implies. The abstraction becomes concrete.

Toy model · Positive class: $\mathbf{x} \sim \mathcal{N}(\mu_+,\, \mathbf{I})$ · Negative class: $\mathbf{x} \sim \mathcal{N}(\mu_-,\, \mathbf{I})$ · Score: $s = x_1$ · Probability: $\hat{p} = \sigma(s) = \tfrac{1}{1+e^{-s}}$ · Predict $+$ if $\hat{p} \geq \tau$

Sample size 400 Total number of points drawn from both classes.

Center +0.0 Mean of the positive-class Gaussian along x. Move it right to make the classes easier to separate.

Center −-1.5 Mean of the negative-class Gaussian along x. Overlap with Center + controls how hard the problem is.

Fraction positive 0.50 Share of points that are truly positive. Push below 0.2 to simulate a rare-event (imbalanced) dataset.

Probability threshold 0.50 The main lever. Slide it and watch every chart update in lockstep.

LEGEND · FILL=PREDICTED RING=ACTUAL

TP pred + / actual +

FP pred + / actual −

FN pred − / actual +

TN pred − / actual −

01Input space & score distribution

Predictions @ threshold · top: input space · bottom: score distribution

02Outcomes

Confusion matrix

Pred +

Pred −

Actual +

TP0

FN0

Actual −

FP0

TN0

Metric	Definition	Value
TPR (Recall)how well the classifier "recalls" all actual positives	$\text{TP}/(\text{TP}+\text{FN})$	—
FPRhow often the classifier raises a false alarm on negatives	$\text{FP}/(\text{FP}+\text{TN})$	—
Precisionhow trustworthy a positive prediction is	$\text{TP}/(\text{TP}+\text{FP})$	—
F1harmonic mean of precision & recall	$2\cdot\text{Prec}\cdot\text{Rec}/(\text{Prec}+\text{Rec})$	—
Specificityhow well the classifier dismisses true negatives	$\text{TN}/(\text{TN}+\text{FP}) = 1-\text{FPR}$	—
Accuracyoverall fraction of correct predictions	$(\text{TP}+\text{TN})/N$	—
Error Rateoverall fraction of incorrect predictions	$(\text{FP}+\text{FN})/N = 1-\text{Acc}$	—

03Threshold sweep

Precision · Recall · F1 vs threshold

ROC curve

Precision–Recall curve

Why precision and recall trade off

Both share TP in the numerator, but their denominators pull in opposite directions as you move the threshold.

Lower the threshold → more positives predicted → recall rises, precision falls (more false alarms).

Raise the threshold → fewer positives predicted → precision rises, recall falls (more misses).

Predict positive on everything and recall is perfect — but precision collapses. Predict positive only when certain and precision is high — but you miss many true positives. The F1 score and the PR curve make this trade-off explicit so you can pick the operating point that fits your cost structure.

04Youden's J — optimal threshold

Youden's J vs threshold

$J(\tau) = \mathrm{TPR}(\tau) + \mathrm{TNR}(\tau) - 1 = \mathrm{TPR}(\tau) - \mathrm{FPR}(\tau)$

Why Youden's J gives the optimal threshold

$J(\tau) = \mathrm{TPR}(\tau) - \mathrm{FPR}(\tau)$ measures how much better the classifier is than random at a given cut-off. A random classifier has $J = 0$; a perfect one has $J = 1$.

The Youden threshold $\tau^\star = \arg\max_\tau J(\tau)$ maximises the vertical distance between the ROC curve and the diagonal chance line — it is the point on the ROC curve furthest from the no-skill baseline.

This criterion implicitly weights sensitivity and specificity equally. If a false negative is much costlier than a false positive (e.g. cancer screening), you may prefer a lower threshold than $\tau^\star$ even though $J$ is slightly smaller there.