phase 6 · lesson 18 of 22 · Foundations

Retrieval, Tools, and Agents

When a Model Is Part of a System

core question

When should an AI system remember, search, or act through tools?

you should leave able to

Separate model parameters from external memory.
Explain retrieval-augmented generation as search plus synthesis.
Describe tool use as a system design problem with failure modes.

before moving on

Design a small RAG pipeline and list one retrieval failure, one synthesis failure, and one tool failure.

A language model by itself is a frozen memory and a fluent guesser. Put it inside a system with search, documents, tools, memory, and feedback, and the behavior changes. The model is no longer the whole product. It is the reasoning interface inside a larger machine.

Modern AI applications are often systems, not single models. They retrieve context, call tools, write intermediate state, ask follow-up questions, and verify outputs. This is the engineering layer that turns a foundation model into a useful assistant, tutor, analyst, or coding agent.

The idea

Retrieval-augmented generation

Retrieval-augmented generation, or RAG, gives a model external context at inference time. Instead of asking the model to remember everything, the system searches a document store, inserts relevant passages into the prompt, and asks the model to answer from that context.

A RAG system retrieves context before generation, so the model answers with fresh or private evidence.

RAG helps with freshness, attribution, and private knowledge. It does not solve truth by itself. The retriever can fetch the wrong documents, the prompt can bury the important passage, and the generator can still overstate the answer.

Tools are actions with contracts

A tool call is a structured action: search the web, run code, query a database, send an email, create a calendar event. The model proposes the action, but the system must validate inputs, enforce permissions, log effects, and handle errors.

Agents are loops over this pattern:

Observe state.
Decide next action.
Call a tool or ask a human.
Read the result.
Continue or stop.

Lab - Tiny Retrieval

Query

synthesized answer

Retrieval is not memory. It is search over external text, followed by a synthesis step that should stay grounded in what was retrieved.

The model does not need to store every fact in its parameters. A retrieval system searches external text first, then asks the generator to answer from that evidence. The fragile part is the handoff: poor retrieval creates confident synthesis from the wrong context.

Code version

Keyword retrieval toy editable - Python

docs = [
  ("loss", "A loss function measures prediction error."),
  ("split", "A test set estimates performance on fresh examples."),
  ("attention", "Attention retrieves information by comparing queries and keys."),
  ("tool", "A tool call lets an AI system take a structured external action."),
]

question = "How does a model use attention to find context?"

def score(doc, query):
  words = set(query.lower().replace("?", "").split())
  return sum(1 for w in words if w in doc.lower())

ranked = sorted(docs, key=lambda d: score(d[1], question), reverse=True)

print("question:", question)
print("retrieved:")
for key, text in ranked[:2]:
  print(f"- {key}: {text}")

ready

For the advanced reader → Embeddings are not knowledge bases

Vector search uses embeddings to retrieve semantically similar chunks. Similarity is useful, but it is not proof. A passage can be similar and irrelevant, relevant but incomplete, or correct but outdated. Serious RAG systems need chunking strategy, metadata filters, reranking, citations, freshness checks, and evals.

Work this

System design sketch

Design a course-assistant agent for this ML book. Specify:

What documents it may retrieve.
Which tools it may call.
Which actions require human confirmation.
How you would evaluate whether it is helping students learn.

Key takeaways

Foundation models become more useful when embedded in systems.
RAG retrieves context at inference time but does not guarantee truth.
Tool calls need validation, permissions, and error handling.
Agents are loops, so they need budgets and stopping rules.
System design is part of modern machine learning.

The model is not the whole artifact. The artifact is the loop around it: what it sees, what it can do, what is checked, what is logged, and where humans remain in control.

The final phase turns that loop toward action, reward, alignment, and evaluation.