AI agent architecture explained: context engineering, model selection, and the coming price hike
A solo newsletter essay by Nacho de Gregorio of TheWhiteBox, explaining how AI agents actually work and how to use them effectively.
Summary
Nacho de Gregorio of TheWhiteBox publishes the first installment of what he calls "The Agent Bible," a practical guide to working seriously with AI agents. He argues that the single most important skill in AI right now is context engineering — structuring the information you feed an agent — and that most people and startups wildly overcomplicate it.
He makes a pointed case that large frontier models are unnecessary and financially reckless for most agentic tasks, recommending instead "Pareto frontier" mid-tier models such as Gemini Flash, GPT-5.4 mini, or distilled Chinese open-source models like Qwen 3.5 27B. He warns that AI Labs are currently absorbing enormous losses to subsidize user subscriptions — losses that cannot continue indefinitely. A price hike, he argues, is not a matter of if but when, and readers should be training themselves now to work with smaller, cheaper models before that moment arrives.
On context engineering, he explains that an AI agent is just three things: a model, a context harness, and tools. The agent does not truly "execute" actions; it declares intent, and the surrounding system carries out the tool calls and feeds results back into the agent's context. He argues the best context harnesses — like Claude Code or Hermes Agent — use simple markdown files rather than complex vector databases. He introduces a three-file system (user.md, memory.md, boot.md) for structuring personal agent context, and notes that the "source side" of context engineering — how you prepare your documents before ingestion — is more important than the "AI side," yet almost nobody discusses it.
He also advocates for eventually running personal agents on open-source models, noting this is not yet practical today but is his stated long-term personal goal, both for cost reasons and to avoid privacy risks from routing personal data through commercial labs. The episode is the first in a multi-part series; the transcript covers through the user.md section of the context engineering discussion, with memory.md and boot.md to be addressed in subsequent installments.
Key Takeaways
FULL TRANSCRIPT
What this guide covers
Nacho de Gregorio: Nothing in AI is hotter today than OpenAI-style agents — these AIs that seem to run forever, know everything about you, and evolve with you.
I ignored them for a while, until I couldn't. We talked about them here in healthy detail. But over the last week or so, I've been dabbling in agents more seriously, stretching them to their limits, trying to find what is real and what is not.
I now have the answer to give you. And I have to say it's actually pretty cool. So today I'm going to explain how I use AIs, including answers to the following key elements in agentic AI:
One, how to optimize model selection — and the hint is, it's not Opus 4.6. Two, how to understand context engineering, key context architecture and management ideas, and overall best practices for anyone wanting to get serious with agents — including how I structure prompts, with examples you can use for yourself. Three, why I believe we are soon going to see price hikes, and why you should be preparing for them. And four, tips, tricks, and recommendations I use to avoid the ugly side of these things, including both my own takes and, perhaps more interestingly, leaked lessons from a top closed AI lab.
If context engineering is the most important skill you can learn in AI currently, by the end of this post, you're going to be an expert yourself.
Agents in first principles
Agents are AI models that execute actions. And, crucially, the quality of the action is a function of two things.
First, the quality of the context we provide the model with. If the agent doesn't have the correct context, it can't execute well. Second, the quality of the model's knowledge and intelligence. If the agent doesn't know what it doesn't know, or isn't smart enough to suggest good actions, the outcome won't be good. As I always say, you can't teach a dog to read, no matter how hard you or they try.
Handling the latter depends on choosing the right model for the task. You can, of course, choose to always use the smartest model possible, but as we're going to see, AIs are not precisely cheap, so you're going to bankrupt yourself eventually if that's your approach. The harsh reality is that you need to learn to adjust model choice to the task complexity. We'll tackle this in more detail below.
The former is much more under your control. As I always tell my clients, the only question you need to ask yourself whenever you interact with an agent is: am I providing the model with the right context?
Besides having the right context at the right time, the other thing you must consider is whether the agent has the right tools to execute your request. Crucially, when we say agents execute actions, in reality, they just "declare intent."
All the model does is process a sequence of text and previous actions and decide what to predict next — whether that's a new word, or a declaration of the need to execute a tool, such as a Google search to make an internet search, or Stripe to create an invoice.
Then, the system takes in that declaration, executes the chosen tool, and provides the tool's response to the agent — the execution trace — which guides the model's next steps.
So, to summarize, an agent is just a combination of three things: an AI model, a context harness, and tools.
First decision: model selection
Ironically, although not the most important thing here, choosing the right model for the task is more crucial for your wallet than for performance.
Here, you need to search for models with the right price while still offering the characteristics an AI model must have to execute well as an agent.
Those are: first, planning — the model must plan how to execute the task. This may read as obvious, but it is one of the hardest capabilities to achieve from AI models. Second, excellent tool-calling capabilities — the model needs to be able to see what the task is and what tools it has available, and correctly decide which tool is the right one, if any. This is surprisingly hard for models, and it automatically discards many of them. Third, long-horizon capability — the model has to be able to execute very long execution traces, sometimes executing dozens or hundreds of tools in a sequence. Fourth, long context windows — agentic workloads are extremely long, so a model with a short context window, meaning limited working memory, is useless. Fifth, reliability — the model must be resistant to hallucinations. Sixth, cost-effectiveness — the model must not bankrupt you in the process.
The list of models that meet all these criteria is surprisingly short: basically, Pareto frontier models. Notice that the keyword here is Pareto. You have to be out of your mind to be using frontier models for most agentic tasks, as most do not — and I can't overstate this — require frontier-level intelligence.
That is, you should aim for middle-sized models with just-enough performance that don't waste money.
What models are these? Models like Gemini 3.1 Flash or GPT-5.4 mini for US models, and GLM-5, Kimi K2.5, Qwen 3.5 27B — the Opus distilled version — as Chinese options.
What all these models have in common is that they are small enough to be reasonably priced, while being distilled directly from the frontier models, making them by far the best bang for your buck.
Frontier models should be used rarely — mostly for research and particularly complex tasks like coding, which are most often not agentic.
Be that as it may, most agentic tasks do not require the spiky, borderline-savant capabilities that these models offer in areas like coding or math. In other words, calling a tool to create an invoice does not require frontier-level intelligence, period.
You could push back on the planning side, though. For very ambitious agentic tasks — a model thinking for days on a task, or truly automating a considerable portion of our daily tasks — you would certainly need frontier-level models.
But here's the thing, and one of the key takeaways from this piece: today's models plan things they actually cannot execute. Most of the tasks you can think of that would need frontier models are not viable today.
Although we'll talk about this in more detail next week, ironically, the reason is not that they can't, but that the tools they need to execute aren't ready. This will be a common theme over the next months and years — the digital world needs to become agent-ready.
The inevitable price hike
But the biggest reason you probably want to avoid frontier models for agents is that their true costs are hidden from us, and you should definitely be getting used to frugality.
One of the biggest lies we're told is that AI is cheap. It's not. You could, of course, argue that the value these tools provide is worth every cent, but that's a position of faith that is by no means represented in reality — that is, in revenues.
Uncontrolled use of AI is incredibly expensive, and that's even considering the fact that we are being extremely subsidized.
A viral image circulating recently shows the extent to which some people are milking AI subscriptions to levels beyond what is foreseeable: 9,200 deployed agents, 17,000 files touched, 1.1 billion processed tokens, a total estimated expenditure of $27,000 — with a single Claude Max subscription.
That value is calculated using API prices, which are $5 and $25 per million input and output tokens respectively for the most expensive model, Opus.
Nonetheless, as noted by Anthropic itself, the average subscription costs them $180 per month to serve, even though the average subscription is the Pro one at $20 per month.
In other words, AI is way, way more expensive than we realize. We are just being spoiled by the AI labs burning cash to maintain their skyrocketing growth and market share.
But this begs the question: for how long? One thing is to subsidize to some extent; another is for Anthropic to take a 135x loss on a single subscription, as with the previous user.
That won't last long.
In other words, eventually, once they have us all completely hooked, they'll start raising prices, and we'll realize the true extent of the cost. This could happen in a year or tomorrow.
Thus, you need to get your act together and start choosing your models wisely, getting used to talking to "worse" models that are "good enough" for the task.
You need to get out of your comfort zone, assuming not every request you have can or should be handled by GPT-5.4 or Opus 4.6. The reason is that this affects how you prompt them, what you share with them, and your patience. Using "worse" models requires skills that have to be trained.
If you get used to using the best of the best, you'll soon face a decision: more frugality or bankruptcy?
Open source: the answer to agents?
This leads us to what I believe will be a common theme amongst agents: open source.
I have a clear plan with agents: although it's not a reality today, I will eventually make sure my personal agent runs on open-source models. Not only is that cheaper, but it is also way more secure than trusting your personal data to these labs.
Besides, I don't want to fear the day when they decide to double the cost of each subscription.
As I was saying before, most agentic tasks — things like managing your email inbox, researching news, or creating invoices for your company — are completely doable with moderately intelligent models.
The context problem, clarified
But enough about models, because the biggest lessons I can teach you today have nothing to do with the AI itself, but the system around it.
Regarding context, it's without a doubt the one component in the agent trifecta — model, context, tools — that you have the most control over, and the one you should put the most care into.
This is called context engineering, and it represents a very interesting dichotomy: it's pretty simple to understand, and more often than not overcomplicated by everyone, but it's nightmarish to implement correctly.
You would be surprised how stupidly complex people make context engineering out to be. Vector databases with hybrid BM25 implementations coupled with fuzzy matching and whatnot — all implementations competing to see who can put more jargon into a single sentence in hopes of appearing sophisticated enough for investors to pour money into their startup.
But in reality, all that complexity is actually performance-degrading. Instead, the most successful context harnesses — examples like Claude Code or Hermes Agent — make it much simpler: markdown files and a context file management system.
Context engineering is just adding the relevant context to the model's prompt. What your AI agent actually sees is a prompt that looks like this:
```
The set of rules and instructions for the agent's behavior.
The list of tools that the agent can decide to use. This can (and should) be dynamic.
Set of rules that define how the agent should respond.
Explanations of "who the user is"
AI's scratchpad to add memories for future reference
references to past conversations
The user's prompt.
```
Every prompt you send to an agent should look in some way similar to what you're seeing above. It's not a rigid structure — you can decide what clauses to use and how to structure it — but it should definitely be structured.
Of course, the key part here is what's inside the Context clause. Hence, context engineering is nothing but making sure that what goes in there is good context. Simple as that.
Therefore, everything we are going to discuss below is just this: writing good prompts and markdown files for our agent, coupled with separation of concerns. This is the simplest form of context engineering and precisely the one that works best.
Context engineering has two sides: the AI side — how we ingest that context in a digestible way — and, more importantly, the source side — how we prepare our sources. Interestingly, nobody mentions the second one, even though it's way more important. But first, let's tackle the former, which everyone thinks makes them a context engineer.
The AI side: three files
The AI side is what most people think of as context engineering: how you mold the context once it's available. And while people love to complicate this, you actually only need three files: user, memory, and boot.
As for the first one, user.md, this is how I think about what to put into it.
Emotional context. This is who I am, and this is how I want to be treated. There's nothing sweet about my interactions with agents. For all the love and care I yearn for from my close family and friends, I couldn't care less about how sweet my models are to me. To me, rigor is the most important thing, and I'm very clear about it — although I know some of you don't feel the same way, and that's okay. But please be aware that asking models to indulge in human-like conversations turns them into hallucination machines, as you're forcing them into a part of their response distribution that's valued not for accuracy but for sycophancy.
Furthermore, avoiding an overly sweet demeanor plays a double role for me: it helps me refrain from anthropomorphizing these agents, because that's what your mind will be tempted to do all the time.
Recent relevant context. It's great to see agents figuring out stuff about you by themselves, but it's a lot easier to just tell them and keep an up-to-date state of what matters to you. Use user.md to explain the things that matter right now.
You can also make it discoverable by the AIs. This is tricky today because every token of context matters — it's certainly finite — so agents will struggle. For example, it may take the model several emails to realize you're a regular Huel customer and that it should track Huel shipments, if that matters a lot to you. Several emails mean the agent will spend a lot of money figuring this out, despite it being literally a sentence in a markdown file.
Work context. This one is pretty straightforward. Describe what your current job life looks like, as well as what your aspirations and ongoing projects are.
Financial context. Here's where you give models access to your financial data. These models are increasingly capable of handling Excel files, so you can send them directly. In my case, however, I prefer not to do so, for two reasons.
First, handling Excel does require pretty advanced capabilities — you're going to have to use the top models. Second, they still hallucinate considerably. Even the top models make many mistakes. Just like frontier models rarely hallucinate in chatbot assistant formats at this point, they are still considerably prone to making mistakes on spreadsheets.
Instead, although I haven't yet fully implemented this, I'm planning to expose my financial data to these models via terminal commands, with background code handling the responses and having the model directly receive the actual answers. In other words, I'm planning to create a financial tool for my agent.
This goes to the core of one of the key recommendations I'm giving you today: you should not force models to live in your world. Instead, adapt your interfaces to them. This will make sense in a second.
This all ends in a user.md prompt — a written description of all I've explained above.
But things get much more interesting once we discuss the other files, because it's here that we start to see agents' real powers.