A few years ago there was one common way to give an AI agent knowledge it did not have in its weights. You stuffed snippets into context with RAG.
That is now one of four.
Memory, Skills, and agentic search joined the list in the last eighteen months. They are not interchangeable with RAG. They are not drop-in upgrades. They demand capabilities the model did not have when RAG shipped.
RAG was robust because the retrieval decision was removed from the model. The retrieval system found the snippets. The model just had to read them. The newer primitives all hand the decision back. The model decides when to retrieve, what is worth remembering, which skill applies, when it is about to answer something it does not know.
The trade is robustness for capability.
RAG, Briefly
A user asks a question. A retrieval system runs an embedding match against a corpus. The top-K snippets are appended to the prompt. The model answers from what it sees.
Two things are true about RAG that are easy to forget. The model is not retrieving anything. The retrieval system is. And the model can answer well even when it does not reason well, because the snippets are doing the work. This is why RAG still ships on Llama 3.1 8B and on every model class above it. It does not require frontier capability. It also gives you a clean audit trail, predictable latency, and a deterministic re-index path when the data changes.
The cost is what RAG cannot do. It cannot remember the user across sessions. It cannot invoke a packaged capability with its own scripts and assets. It cannot recognize that the question is structured and should hit a SQL tool instead. Embeddings are a single shape. Not every problem is shaped like a passage of text.
Memory: When the Knowledge Accumulates
Anthropic's Memory tool is a filesystem the model can read and write across sessions. The API tool is `memory_20250818`. The system prompt instructs the model to view its memory directory before doing anything else. It is client-side: the model emits tool calls, your application executes them against `/memories`.
What gets stored is whatever the model writes. The contract is between the model and your storage layer, not between embeddings and a vector index. Anthropic reports a 39% improvement on internal agentic-search evaluations when memory is combined with context editing, and an 84% token reduction across a 100-turn web search.
The shape is different from RAG. Memory is keyed on whatever scope the application defines: user, project, agent, or session. It accumulates over time. It updates implicitly. The model decides what to write.
That is where the model dependence shows up. A weaker model writes worse memories. It misses the important fact, persists noise, or confuses a stated preference with a passing aside. Memory works because the model is doing reflection, not because vectors are doing matching.
The failure modes match the architecture. Path traversal if you do not validate. Cross-scope contamination if your storage layer leaks. Conflicting memories the model resolves "arbitrarily." Two real adversarial PoCs were published in April 2026: a memory hijack on Opus 4.7 that persisted false biographical facts via an injected image, and a Claude Cowork file exfiltration that bypassed VM egress controls. Memory expands the attack surface.
Memory is the right primitive when the knowledge is scoped and accumulates. It is the wrong primitive when the knowledge is corpus-general or strictly structured.
Skills: When the Knowledge Is Procedural
A Skill is a directory with a `SKILL.md` file at the root, optional scripts, and optional assets. The SKILL.md has YAML frontmatter (name and description) and a Markdown body. The model sees only the name and description in its system prompt at startup. When the description matches the user's request, the model reads the SKILL.md body. When the body references a script, the model executes it via bash without putting the script's source in context.
Is Skills really a new primitive, or just packaging on top of system prompts and tools? Mostly packaging. What makes it primitive-shaped is that the loading decision moves to the model and the context cost stays at zero until invoked. Anthropic calls this progressive disclosure: roughly 100 tokens always loaded, under 5,000 loaded when triggered, and effectively unlimited available on demand. Skills became an open standard in December 2025 with `agentskills.io` as the spec home.
The rest follows from the loading model. Skills are versioned. They package instructions, code, and assets together. They are model-invoked, not user-invoked.
The model dependence is the invocation decision. Anthropic's own skill-creator guidance warns that "Claude has a tendency to undertrigger skills" and recommends "pushy" descriptions. A weaker model misses skills. A more capable model finds them.
Anthropic ships built-in skills for PDF, Excel, PowerPoint, and Word. Hedgineer has a public writeup of building a four-domain Skills knowledge layer (AI, Data, Infra, UI) that travels across their team. Simon Willison's framing on release: "I expect we'll see a Cambrian explosion in Skills which will make this year's MCP rush look pedestrian."
Skills are the right primitive when the knowledge is procedural and version-controlled. They are the wrong primitive when the knowledge is dynamic or per-user.
Agentic Search: When the Knowledge Is Structured
Agentic search is what Claude Code does. The model has tools (grep, glob, read) and decides when to use them. There is no embedding index. The model writes a query, reads the result, and decides what to do next.
Boris Cherny, who built Claude Code, says publicly that early versions used RAG with a local vector database, but agentic search "outperformed everything. By a lot." His framing: "It is also simpler and doesn't have the same issues around security, privacy, staleness, and reliability."
This is not the universal pattern. Cursor still indexes embeddings, then layers agentic tools on top. Sourcegraph and GitHub Copilot Workspace run hybrids. Agentic search is the shift, not the replacement.
The model dependence is sharpest here. Agentic search asks the model to formulate queries, recognize that an answer requires retrieval at all, evaluate snippets without losing the thread, and stop when it has enough. The Berkeley Function Calling Leaderboard tracks this as a separate capability from generation. Top frontier models exceed 88% on multi-turn function calling. Smaller models drop sharply.
The Search-R1 paper measured this directly. RL-trained agentic search improved performance by 26% on Qwen2.5-7B, 21% on Qwen2.5-3B, and 10% on LLaMA3.2-3B. The gains scale with model size in this setup. The slope is suggestive, not universal, but the direction is consistent across the agentic-search literature: smaller models cannot make the search-or-not-search decision well enough. RAG never had this scaling problem because the search-or-not-search decision was not the model's to make.
Agentic search is the right primitive when the corpus is structured (a filesystem, a code repository, a queryable database) and you have a model that reasons about its own knowledge gaps. It is the wrong primitive when latency is critical and the answer space is well-bounded.
The Trade
Four shapes of the same problem: how does the agent get knowledge it does not already have?
- RAG wins on stable corpora, factual queries, audit trails, and graceful degradation on weaker models.
- Memory wins on scoped knowledge that accumulates over sessions.
- Skills wins on procedural, versioned capability with zero context cost until invoked.
- Agentic search wins on structured corpora the model can navigate.
An honorable mention: long context. With million-token windows and prompt caching, corpora under roughly 800K tokens can be loaded directly without retrieval. The trade is cost-per-call versus retrieval complexity. For stable, bounded knowledge bases this often beats all four.
These are not exclusive. A production support agent typically uses RAG for the docs, memory for the customer, skills for refund procedures, and agentic search for the ticket database, in one turn. The framework is for picking the right primitive per knowledge type, not per system.
The trade across all of them is robustness for capability. RAG works no matter how the model behaves. The other three do not. Most "RAG systems" shipped in 2024 were doing memory or knowledge-base work badly because RAG was the only available primitive. In 2026 there is a choice. Pick the shape that matches the problem.
Closing
The pattern across all four is the same. Knowledge lives somewhere. The agent has to find it. The question is who decides what to load, when, and how.
RAG decides for you with embeddings. The newer primitives ask the model to decide.
A weaker model lets you ship RAG. A stronger model lets you ship the other three.