Retrieval, Tools, and Agents
When a Model Is Part of a System
core question
When should an AI system remember, search, or act through tools?
you should leave able to
- Separate model parameters from external memory.
- Explain retrieval-augmented generation as search plus synthesis.
- Describe tool use as a system design problem with failure modes.
before moving on
Design a small RAG pipeline and list one retrieval failure, one synthesis failure, and one tool failure.
A language model by itself is a frozen memory and a fluent guesser. Put it inside a system with search, documents, tools, memory, and feedback, and the behavior changes. The model is no longer the whole product. It is the reasoning interface inside a larger machine.
Modern AI applications are often systems, not single models. They retrieve context, call tools, write intermediate state, ask follow-up questions, and verify outputs. This is the engineering layer that turns a foundation model into a useful assistant, tutor, analyst, or coding agent.
The idea
Retrieval-augmented generation
Retrieval-augmented generation, or RAG, gives a model external context at inference time. Instead of asking the model to remember everything, the system searches a document store, inserts relevant passages into the prompt, and asks the model to answer from that context.
RAG helps with freshness, attribution, and private knowledge. It does not solve truth by itself. The retriever can fetch the wrong documents, the prompt can bury the important passage, and the generator can still overstate the answer.
Tools are actions with contracts
A tool call is a structured action: search the web, run code, query a database, send an email, create a calendar event. The model proposes the action, but the system must validate inputs, enforce permissions, log effects, and handle errors.
Agents are loops over this pattern:
- Observe state.
- Decide next action.
- Call a tool or ask a human.
- Read the result.
- Continue or stop.
Lab - Tiny Retrieval
synthesized answer
Retrieval is not memory. It is search over external text, followed by a synthesis step that should stay grounded in what was retrieved.
The model does not need to store every fact in its parameters. A retrieval system searches external text first, then asks the generator to answer from that evidence. The fragile part is the handoff: poor retrieval creates confident synthesis from the wrong context.
Code version
For the advanced reader → Embeddings are not knowledge bases
Vector search uses embeddings to retrieve semantically similar chunks. Similarity is useful, but it is not proof. A passage can be similar and irrelevant, relevant but incomplete, or correct but outdated. Serious RAG systems need chunking strategy, metadata filters, reranking, citations, freshness checks, and evals.
Work this
System design sketch
Design a course-assistant agent for this ML book. Specify:
- What documents it may retrieve.
- Which tools it may call.
- Which actions require human confirmation.
- How you would evaluate whether it is helping students learn.
Key takeaways
- Foundation models become more useful when embedded in systems.
- RAG retrieves context at inference time but does not guarantee truth.
- Tool calls need validation, permissions, and error handling.
- Agents are loops, so they need budgets and stopping rules.
- System design is part of modern machine learning.
The model is not the whole artifact. The artifact is the loop around it: what it sees, what it can do, what is checked, what is logged, and where humans remain in control.
The final phase turns that loop toward action, reward, alignment, and evaluation.