course bible

Seven Phases of Machine Learning

A first-year student should stay oriented; a graduate student should see the real machinery and the places where the field is still unsettled.

The course is designed as an interactive book: intuition first, then the mathematical object, then a lab or demo, then the failure mode. A reader should be able to move from first principles to modern AI systems without losing the thread.

Study guide Projects Glossary

Chapter Rubric

A cold-open hook that makes the problem feel necessary.
A first-year explanation before notation.
A formal model: objective, assumptions, and failure mode.
A demo or lab that changes the reader's belief through interaction.
A graduate lens that names what experts worry about.
A problem set or experiment prompt that forces transfer.
A coda that connects the chapter to the next idea.

Historical Spine

The course treats history as a debugging aid. Each breakthrough changed what researchers could optimize, measure, or scale.

1950

Alan Turing frames machine intelligence as a behavioral question: can a machine carry on an imitation game well enough to make the boundary hard to police?

1958

Frank Rosenblatt demonstrates the Mark I Perceptron, turning weighted votes from examples into a public symbol of learning machines.

1959

Arthur Samuel uses the phrase machine learning while building a checkers program that improves from play.

1986

Rumelhart, Hinton, and Williams make backpropagation central to neural-network training, showing how one loss can teach hidden layers.

1998

LeNet shows convolutional networks working at production scale for handwritten digit recognition.

2012

AlexNet wins ImageNet by a large margin and makes GPU-trained deep learning impossible to ignore.

2017

Attention Is All You Need introduces the transformer architecture that later becomes the workhorse for language, code, vision, and multimodal systems.

2020

Scaling-law work turns model size, data size, and compute into an empirical design surface rather than guesswork.

2022

Diffusion image models and conversational language models make generative AI visible to the public, raising the stakes for evaluation and alignment.

Capstone Spine

Prediction Autopsy: Choose a real ML feature and reverse-engineer the learning loop behind it.
Honest Evaluation Report: Take one dataset and show how the result changes under bad and good evaluation practice.
Classifier From Scratch: Implement logistic regression with gradient descent and explain every term.
Tiny Autodiff Engine: Build a small scalar autodiff system and train a two-layer network.
Inductive Bias Lab: Compare a dense model, a convolutional model, and a sequence model on tasks they are biased to solve.
Retrieval-Augmented Tutor: Build a small system that retrieves notes, answers questions, and cites the retrieved evidence.
Agent Reliability Trial: Create a small agent task and evaluate it before and after a reward or instruction change.

Phase 1

The Learning Loop

What machine learning is, what it optimizes, and why prediction is a broader idea than forecasting.

reader promise

You can describe any supervised learning system using data, model, loss, optimizer, and generalization.

graduate lens

The phase establishes empirical risk minimization as the recurring mathematical frame.

Phase 2

Generalization and Measurement

How to know whether a model learned a pattern, memorized a sample, leaked data, or merely optimized the wrong score.

reader promise

You can design train/validation/test splits, choose metrics, and explain why held-out performance matters.

graduate lens

The phase introduces distribution shift, bias-variance, calibration, and metric design as scientific controls.

Phase 3

Linear Models and Optimization

Boundaries, weighted votes, gradients, and the first models simple enough to understand completely.

reader promise

You can derive how a line becomes a classifier and how a local update changes the decision boundary.

graduate lens

The phase ties geometry to convex objectives, stochastic gradients, and parameter identifiability.

Phase 4

Deep Networks

Layers, nonlinear representations, backpropagation, initialization, optimizers, and regularization.

reader promise

You can explain how a scalar loss teaches many layers and why training stability is an engineering problem.

graduate lens

The phase treats autodiff, residual pathways, normalization, and implicit regularization as core machinery.

Phase 5

Representations Across Space and Time

Architectures that exploit structure: images have locality, sequences have order, and attention creates content-addressed retrieval.

reader promise

You can recognize the inductive bias behind CNNs, RNNs, and attention instead of memorizing architectures.

graduate lens

The phase connects weight sharing, receptive fields, hidden state, and quadratic attention cost.

Phase 6

Foundation Models

Transformers, scaling laws, generation, multimodal embeddings, retrieval, and tool-using systems.

reader promise

You can trace how next-token training becomes a general interface for language, images, retrieval, and tools.

graduate lens

The phase separates pretraining, post-training, retrieval augmentation, and system evaluation.

Phase 7

Agents, Alignment, and Evaluation

Learning from reward, aligning behavior with human intent, evaluating deployed systems, and thinking about frontier risk.

reader promise

You can reason about policies, preference models, eval suites, reward hacking, and agent reliability.

graduate lens

The phase treats evaluation, oversight, interpretability, and governance as technical design constraints.