The Prediction Machine
What AI Actually Does (and Doesn't Do)
core question
What does a machine-learning system actually learn when it seems to understand?
you should leave able to
- Describe a model as a function learned from examples.
- Separate data, model, loss, optimizer, and generalization.
- Explain why prediction includes classification, generation, forecasting, and action.
before moving on
Given a real product feature, name its input, output, training signal, and failure mode.
Your phone does not know what a dog is. It has never smelled wet fur, watched one beg for food, or heard paws tapping on a kitchen floor. It receives a rectangle of numbers from a camera sensor, pushes those numbers through a function, and returns a word. The unsettling fact at the heart of machine learning is that, often enough, the word is right.
This course is about how that happens. Not the marketing version, where "AI" floats above ordinary engineering. Not the mystical version, where prediction is confused with thought. Machine learning is the discipline of building functions from data: functions that classify images, forecast demand, translate languages, write code, steer robots, recommend videos, and hallucinate when their learned regularities run out.
The first move is to make the word prediction broad enough. Predicting tomorrow's temperature is prediction. So is answering "cat" for an image, selecting the next word in a sentence, choosing the next action in a game, and denoising an image from random static. The output may look like a category, a sentence, a control signal, or a picture, but underneath there is a learned map from inputs to outputs.
The idea
The baggage desk analogy
Imagine you work at an airport baggage desk. You have seen thousands of bags and know which ones usually go to fragile handling, oversize handling, or the regular belt. A new bag arrives. You know its weight, shape, declared contents, and route.
Based on past bags, you predict: oversize handling.
That is the first honest picture of machine learning. You are not consulting a rule written by a baggage philosopher. You are using experience. Past examples have carved a shape in your judgment. New examples are placed inside that shape.
Machine learning formalizes this habit. It gives us:
- Data: examples of the world.
- A model: a family of functions we are willing to consider.
- A loss: a way to measure how wrong a function is.
- An optimizer: a procedure for choosing better parameters.
That list is not a simplification for beginners. It is the skeleton of the field. Linear regression, convolutional networks, transformers, diffusion models, and reinforcement-learning agents all fit inside it.
The Three Ingredients:
The diagram hides one more word that matters: generalization. A model that memorizes the training examples is not useful. The goal is to do well on examples it has never seen, drawn from the same underlying process. Most of machine learning is a fight between fitting the data you have and preserving enough simplicity to handle the data you will see later.
Prediction, compression, and representation
There is a deeper way to say the same thing. To predict well, a model must compress the useful structure in the data. A model that predicts house prices learns that location and square footage matter more than the paint color of one wall. A language model that predicts the next token must compress grammar, style, facts, code patterns, and social convention into its parameters because all of those help reduce prediction error.
This is why prediction is not trivial. A strong predictor often has to learn representations that look like understanding, because the world is structured and the easiest way to predict one part is to infer hidden structure in the rest.
Demo - Linear Fit Playground
Key takeaways
- Machine learning builds functions from examples.
- Data, model, loss, and optimizer are the four pieces of the basic loop.
- Prediction includes classification, generation, forecasting, and action choice.
- Generalization matters more than memorization.
- A model can be useful without having human-like understanding.
The three major learning setups
The field splits into several training setups.
Supervised learning uses labeled examples: image to class, house features to price, sentence to translation. Unsupervised learning asks for structure without labels: clusters, latent variables, embeddings, compression. Self-supervised learning creates labels from the data itself, such as predicting masked words or the next token. Reinforcement learning trains from rewards collected through action.
Modern frontier systems mix these setups. A language model may be pretrained with self-supervised next-token prediction, adjusted with supervised examples, and then aligned using preference or reward signals. The labels changed, but the loop did not: define a target, measure error, update parameters.
Math details
At its core, machine learning is function approximation. We want to learn a function:
Given inputs , predict outputs . Training finds parameters that minimize:
Where is a loss function measuring prediction error. Common choices:
- Mean Squared Error (regression):
- Cross-Entropy (classification):
Implementation
Building Your First AI Model
import numpy as np
from sklearn.linear_model import LogisticRegression
# Data: features and labels
X = np.array([[1, 2], [2, 3], [3, 1], [6, 5], [7, 8], [8, 7]])
y = np.array([0, 0, 0, 1, 1, 1]) # 0 = class A, 1 = class B
# Create and train model
model = LogisticRegression()
model.fit(X, y)
# Predict new example
new_point = [[5, 4]]
prediction = model.predict(new_point)
probability = model.predict_proba(new_point)
print(f"Predicted class: {prediction[0]}")
print(f"Probabilities: {probability[0]}")
Prompt for Claude Code
"Create a simple logistic regression classifier in Python using sklearn. Train it on the XOR problem and visualize the decision boundary with matplotlib."
Work this
Prediction audit
For each system, name the input, output, training signal, and one likely failure mode: a spam filter, a route planner, a text-to-image model, and a coding assistant.
The machine is not awake. It is doing something narrower and still remarkable: turning examples into a function that survives contact with new cases. Once that feels ordinary, the rest of machine learning becomes less mystical and more inspectable.
The next chapter asks how a model chooses the function instead of merely naming the problem.