Drawing Boundaries
When Lines Become Decisions
core question
How does a weighted sum become a decision boundary?
you should leave able to
- Interpret a linear classifier geometrically.
- Connect weights and bias to the location and tilt of a boundary.
- Explain why linearly separable data is the easy case.
before moving on
Sketch what happens to a boundary when one feature weight doubles and the bias moves negative.
A doctor does not ask a tumor to reveal its name. A credit-card company does not ask a transaction if it is fraud. A spam filter does not interview an email. They all do the same colder thing: take measurements, draw a boundary, and decide which side the new case belongs on.
Regression predicts a number. Classification makes a decision. That sounds like a small change until you notice how many real decisions are classification problems: approve or reject, benign or malignant, pedestrian or background, safe or unsafe, match or no match. Machine learning became economically important not because it could fit beautiful curves, but because it could draw useful boundaries in messy spaces that humans could not inspect by eye.
The idea
Start with the boundary, not the label
Imagine sorting email by two crude features: how many exclamation marks it uses and how many words are in all caps. Plot a few messages and two clouds appear. Most spam lives in one region. Most real mail lives in another. The classifier's job is not to understand the email. Its job is to place a boundary so that future messages on one side are called spam and messages on the other side are not.
In two dimensions the boundary is a line. In three dimensions it is a plane. With one thousand features it is a hyperplane, which is a terrible word for the same idea: a flat separator in a space too large to draw.
The model computes a score:
The vector is the example. The vector says which measurements matter and in which direction. The bias shifts the dividing line. If the score is high, predict class 1. If it is low, predict class 0.
That alone gives a hard classifier. Logistic regression adds one extra move: it turns the raw score into a probability. The sigmoid function squashes every real number into the interval from 0 to 1:
A score of 0 becomes 0.5, which is the boundary. A large positive score becomes almost 1. A large negative score becomes almost 0. The model can now say not just "spam", but "92 percent spam according to this boundary."
Why cross-entropy exists
Classification needs a different loss than squared error. Suppose the correct label is 1. A prediction of 0.51 is barely right; a prediction of 0.99 is much better. If the correct label is 1 and the model says 0.001, that is not just a small numerical miss. It is a confident wrong decision.
Cross-entropy punishes exactly that:
When the model assigns high probability to the right class, the loss is small. When it assigns near-zero probability to the right class, the loss explodes. This is why classification systems learn not only to be right, but to stop being recklessly certain when the data does not support it.
Demo - Decision Boundary
A ring marks the current example. A thicker ring means the point is still misclassified, so the boundary has work to do.
The lesson is not that a line is powerful. The lesson is that a decision can be turned into optimization. Once the label is encoded as 0 or 1, the boundary's mistakes become a loss, and the loss becomes something a machine can minimize.
Key takeaways
- Classification turns measurements into decisions by drawing boundaries.
- Logistic regression scores an example with and converts the score to a probability with the sigmoid.
- The 50 percent contour is the decision boundary.
- Cross-entropy strongly penalizes confident wrong predictions.
- A linear classifier can only draw flat boundaries; some datasets require depth.
The geometry hidden inside logistic regression
For any two points and , the classifier compares their projections onto . Points with the same value of sit on the same contour. The decision boundary is the contour where that value is zero:
The vector is perpendicular to the boundary. Its length controls how quickly the probability changes as you move away from the line. A short gives a soft, uncertain transition. A long gives a sharp cliff. In real systems that sharpness matters: overconfident cliffs tend to fail badly when the test data drifts.
For the advanced reader → XOR: the smallest classification problem a line cannot solve
The XOR dataset contains four points:
No single line can put the two 1s on one side and the two 0s on the other. This tiny problem matters because it exposes the limit of a single linear separator. To solve XOR, the model must first create a new representation in which the classes become separable. That is what hidden layers do.
For more than two classes, the sigmoid becomes softmax:
Softmax is just several logistic classifiers competing, normalized so their probabilities sum to one.
Math details
Logistic regression model:
Cross-Entropy Loss (log loss):
Gradient of cross-entropy w.r.t. weights (remarkably simple!):
Multi-class softmax:
Implementation
Logistic Regression Classifier
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
# Generate binary classification data
X, y = make_classification(n_samples=200, n_features=2,
n_informative=2, n_redundant=0,
n_clusters_per_class=1, random_state=42)
# Train logistic regression
model = LogisticRegression()
model.fit(X, y)
# Visualize decision boundary
def plot_decision_boundary(model, X, y):
h = 0.02 # Step size
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
Z = model.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.3, cmap='RdYlBu')
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='RdYlBu', edgecolors='k')
plt.title('Logistic Regression Decision Boundary')
plt.show()
plot_decision_boundary(model, X, y)
Prompt for Claude Code
"Create a logistic regression classifier with sklearn. Generate a 2D dataset with make_classification and visualize the decision boundary with a probability heatmap."
Work this
Boundary audit
Sketch a dataset where a linear boundary works, a dataset where it fails, and a dataset where no boundary should be trusted because the labels overlap. For each, say what the model would learn and what the user might falsely conclude.
A classifier is a line with consequences. Move the line and someone gets a loan, an email reaches an inbox, a patient gets a second look. The mathematics is clean, but the responsibility starts early: every boundary is also a policy about who or what gets grouped together.
The next chapter asks what happens when one boundary is not enough.