The Agent Bible, 2nd Part: Software 3.0 — TheWhiteBox by Nacho de Gregorio Transcript

Newsletter on AI agents, agentic software, and the future of software architecture

A solo written newsletter by Nacho de Gregorio of TheWhiteBox, arguing that most agent limitations stem from the ecosystem around AI rather than from AI itself.

Summary

Nacho de Gregorio, writing in his TheWhiteBox newsletter, opens by declaring that he has finally arrived at a clear understanding of what AI agents are and what the future of software looks like — and that this understanding differs from the mainstream view, including that of top AI labs. He explains the mechanics of tool calling, arguing that agents do not actually execute actions themselves but merely declare intent, with a surrounding harness doing the real work. This distinction, he argues, has major implications for hardware (specifically the renewed importance of CPUs), for how tools should be designed, and for why most agent failures today are not the AI's fault but the fault of an ecosystem not yet built for agents.

He also addresses AI's genuine limitations — reliability, context rot, and cost — and points out that AI labs are currently subsidizing users, citing Anthropic's move to prohibit high-volume agent use on consumer subscriptions as evidence that serving agents at scale is economically unsustainable at current pricing.

On multi-agent design, he warns that subagents work well for parallelization but not for collaboration, as models were not trained to work together and tend to conflict rather than cooperate.

He shares his own use of agents — for context gathering, report generation, and Excel modeling — and argues from personal experience that AI's real impact so far is in iterative, human-in-the-loop workflows, not in full automation. He concludes by introducing his own finance app, built agent-first, as a demonstration of what he calls 'software 3.0.'

Key Takeaways

Agents declare, not execute. AI agents based on Large Language Models do not run tools themselves — they predict a tool name and arguments, and a surrounding harness executes the actual code. This is a fundamental architectural reality with large downstream implications for hardware, software design, and agent reliability.

CPUs matter more than most realize in agentic workloads. Because tool execution happens on CPUs rather than GPUs, CPUs become the bottleneck in agent pipelines. No matter how powerful the AI accelerator, a slow CPU limits overall agent response time — making the CPU renaissance a direct consequence of the agentic shift.

Skills outperform MCPs for tool integration. Model Context Protocol (MCP) lets agents use tools without knowing how; agent skills (markdown files plus code) explain how to use a tool at runtime. Google DeepMind's research showed that adding skills dramatically improved model performance, in some cases from 0% to 52% task completion.

Most agent failures are ecosystem failures, not AI failures. The author's central argument is that agents are being deployed into a world not designed for them — with poor tool definitions, bad sources, and inadequate harnesses — and that this, more than AI capability limits, explains why agents underperform.

AI's proven impact is iterative, not automative. From financial modeling to coding to creative work, the areas where AI demonstrably adds value are those where humans and AI work side by side. Full-automation use cases — customer service, radiology — have consistently failed to materialise at the scale predicted.

Jevons paradox applies to AI productivity. The common assumption that AI productivity gains will simply reduce headcount ignores historical demand dynamics. When productivity rises and costs fall, demand typically expands — meaning AI is more likely to unlock new categories of work than to eliminate existing ones.

The agentic revolution is not evenly distributed. Making agents work requires the right sources and tools. Those who build their own agent-ready infrastructure will benefit far sooner than those waiting for the broader digital world to repurpose itself.

Software 3.0 is agent-first, not human-first. The author's finance app is presented as a demonstration of a new software paradigm — fully built with AI, context-engineering enabled, user-programmable in natural language, with the personal agent as a first-class citizen rather than an add-on.

FULL TRANSCRIPT

Introduction

Nacho de Gregorio: Rarely, if ever in my long writing career, have I been more excited about something than the thing I'm writing about today. Because, finally, I get it. I get agents, and I get software — its future, to be more exact. And I can assure you that, after reading, you will see things the same way I do. That is, different from the rest.

Last week, we took a look at an agent's first leg: how to provide the agent with the right context. This week, we're looking at the other side — how to enable action.

But I've decided to take it a step further and show you, quite literally, what the future of software looks like.

I'm building a finance app for myself, but I've also made it "agentic," meaning my personal agent is a first-class citizen of this app — once again proving that only when you get your hands dirty do you truly get to understand how things really are.

The app has a little bit of everything I've learned in AI over the years. Fully built with AI, context-engineering enabled, fully programmable by the user, using embeddings search for some pretty cool use cases, AI-powered with a fully AI backend, and more. The app adapts to my constraints and evolves with me, creating a fully customized experience.

It's too early to understand what I mean, but let me put it this way: software as we know it is dead.

At this point, I'm comfortable telling you that what you're about to see is nothing you've ever seen before.

So the goal of today is two-fold. First, to clarify — to explain why most agent limitations today are not the AI's fault. Second, to open your eyes — to settle for good the concept of "agentic software" and what it actually means, because even top AI labs are thinking about this wrong.

What we're seeing today has many implications for markets, for startups, for the AI industry as a whole, and, of course, most importantly, for you. I'm going to make you a builder, and that journey starts today.

The Tools Side — More Than It Seems

Recalling our vision of what an agent is: an AI model that has some knowledge of the world and the context you provide, and, based on all of that, takes action on our behalf. It's literally a math equation: f(knowledge + context) = action.

And while today the knowledge component is largely out of our control — because it's dependent on AIs we don't train (in the future, everyone will retrain their agents) — the context component is definitely under our control, which is why context engineering has become such a popular term and why I dedicated an entire article last week to it.

In fact, agents are not so different in spirit from a standard chatbot. The difference between ChatGPT and your personal secretary is really nothing beyond the fact that the latter can take action via the use of "tools."

But what is an agent tool? No, but seriously — what is an agent tool really, not what people tell you?

The Key Concept Known as Tool Calling

What do we actually mean when we say an agent executes an action? In reality, they don't execute anything. The picture is more nuanced: the agent just declares what it wants to use.

This may sound like something with zero implications, but it's actually a $100 billion factor.

At risk of scaring you, you can actually think of an agent as the Marvel villain Supreme Intelligence — this massive head from the Marvel franchise. I know this makes no sense right now, but bear with me.

Why? Because AI agents — at least the popular ones based on Large Language Models — don't have a body or a way to execute actions themselves. They just declare what needs to be done.

They receive a set of context and instructions, coupled with their knowledge, take up all that information, and predict one of two things: either a token (a new word, image, etc.) or a tool call.

But what is a tool call? The meaning is in the name. A tool call is the agent saying, "I want to use x or y tool." This means tools are, ironically, tokens too — the LLM predicts a tool name rather than a word.

That is why I was drawing the analogy with the big-headed villain: an agent declares what needs to be executed, but lacks the body to do it.

For example, say we ask our agent what the weather is like in Madrid, Spain, today. You do this because you know your agent has access to a `get_weather` tool. The model then predicts something like:

```

tool_name = "get_weather"

arguments = {"city": "Madrid"}

```

Then the harness — the system around the AI — maps `get_weather` to real code, and that code may internally call an API endpoint like `GET /weather?city=Madrid`. In that case, the model does not even know the raw HTTP details. It only knows the abstract tool interface.

Alternatively, you can expose the API more directly and let the model produce something closer to:

```

endpoint: "/users/123"

method: "PATCH"

body: {"status": "active"}

```

But even there, the model is still not making the network call itself. It is generating the request specification, and your runtime executes it.

This is important for two reasons. It helps us understand how agents actually look under the fancy naming. And it beautifully explains why CPUs are suddenly important again — yes, that's the $100 billion decision I was alluding to.

On the former: what this means is that an agent, by itself, can't do anything. It requires a harness on top that identifies which tool the agent calls, executes it, and returns the response to the agent for it to interpret.

Robotic AIs are agents too, but real ones. The AIs acting as brains of humanoids are VLA models — Vision-Language-Action models — which are similar to LLMs, but unlike an LLM, which can only make tool calls, the VLA model is actually outputting actions for the robot. This means that, unlike LLM agents, the entire agent trajectory runs on GPUs in a humanoid.

The CPU Renaissance

Before we continue with agents, I can't help but note that agents are a blessing for CPUs precisely because of this. The tool world is a CPU world; the AI world is a GPU world.

Therefore, if agents aren't the ones actually running the tools but just declare them, those declarations are sent to CPUs, which process the tools and send the tool responses back to the GPU.

This is why CPUs are so important — they are the bottleneck in agentic workloads. Without powerful CPUs, you can have the most powerful accelerator the world has ever seen, but the overall agent response will be slow.

And now, back to tools: how do we make AIs use tools effectively?

Skills and MCP

As AIs are nothing but predictions of tools based on their knowledge, the provided context, tool definitions, and instructions, it's vital that tools are thoroughly explained. Because how is an agent supposed to choose a tool to be called if the agent doesn't understand what the tool does?

This is where skills come in. This agent primitive, popularized by Anthropic, is again marketing on steroids — in pure agentic fashion — for something really, really simple: folders and text.

An "agent skill" is nothing but a markdown file (i.e., text) and code scripts (the execution harness) that help the agent understand what a tool does and, importantly, how to use it.

This is materially different from MCPs — the Model Context Protocol — the hottest way to integrate tools into agent settings for a long time, which allows agents to use tools via natural language. The difference is that an MCP lets the agent use a tool without knowing how, while a skill explains how to use it at runtime, like an instruction manual for your dishwasher while you're using it.

The reason skills draw so much interest these days is that they are actually pretty powerful — more than MCPs, even — as they provide the necessary explanations for the agent to get the most out of the tool, as Google proved.

In an interesting blog post, Google DeepMind engineers show that adding a simple skill that explains how to build apps with Gemini libraries dramatically improves the model's capacity — from less than 30% of their top model to near saturation. In some cases going from literally 0% to 52%, and from an okay-ish 28% to basically perfection with the most powerful Gemini model.

It's the realization that skills matter that takes us to the big realization with agents: issues with agents are not that much about AI, but more so about deploying agents in a world not meant for them.

But before we get to the weeds of today's message, let's quickly cover what is definitely to blame on AI.

AI Limitations

As I've said multiple times, we are usually setting up agents for failure, but that doesn't mean all blame is on us. Agents do have some unsolved problems.

First, reliability. We've talked far and wide about this. The "long horizon" nature agents have been attributed is more an illusion than anything else, and they struggle to execute a task consistently. If you increase the accuracy requirements from 50% to 80%, model performance collapses — and 80% is still unacceptable. You want at least 99.99% accuracy.

Second, context rot. We feed them more context than they can meaningfully handle. We are promised one-million-token context windows, but in reality, performance falls off a cliff.

Third, cost — probably the most important of them all. Serving AIs is much more expensive than we realize, and the worst thing is that we are being subsidized by investor money. Once AI labs start charging for AI's true costs, things will look very different.

No better proof that we are getting subsidized than Anthropic prohibiting the use of its subscriptions for open-claw-style agents. You can pay for the API, but that's where you're the one getting bled dry. This shows they were, indeed, getting bled dry — they wouldn't be destroying demand if they were getting paid their fair share.

But I'm not here to tell you that agents are young, because you already know that. The overarching point I'm making today is that the ecosystem around them is younger, and I'm going to show you today how I think it will look in a few months.

The Act: How Do I Use Agents?

Before I explain the app I'm building to put an end to my agent's financial context and action issues, let me recap how I use agents.

As you may have guessed from last week's skepticism, I'm not fully agent-pilled yet as respects to action — what the agent can do for me. Interestingly, though, it's not the agent's fault.

What Agents Do for Me

First, I use them for context and search gathering, and for generating reports that get me up to speed on what's going on in the world without me having a backlog of 100 sources every day.

My agent reviews my key sources — mostly my email accounts — and generates a report that lands in my inbox. Two things I want to say on this.

One: my sourcing has become vetted. In a world where most digital content is low quality, I deliberately choose the sources I trust and want to use. Most applications of agents generating reports for users are toys and not really useful — not because the agents are bad, but because the sources are terrible.

Two: if your agent has to go through a lot of context, as in my case, subagents are a good option. I do not recommend the use of subagents in collaboration situations — models tend to fight with each other and collaborate far less than you would imagine, because they were not trained for it. But here, each agent is working alone, so it's not a collaboration, it's a parallelization, and it works great.

My other big use of agents is in Excel. This is particularly interesting because it has made me the perfect example of why AIs, regularly seen as job destroyers, can actually be demand unlockers.

I rarely used Excel before agents, at least not at the rate I do now. My work is more about getting a handle on trends and deep-diving into very hard concepts that people with decision-making capabilities need to know how to respond to. I would love to do more analytical work — model scenarios, picture futures — because I have the statistical knowledge for it. However, keeping up with everything going on was already a full-time job, so Excel modeling wasn't on the menu.

With ChatGPT for Excel — and there are other options, of course — I can actually elevate my work by testing my own hypotheses. For instance, my study on the war in Iran and how it affects AI was possible thanks to ChatGPT.

To be very clear on what I'm implying: this is an expansion of what I can do, not an improvement of what I was already doing. I didn't have the time to set up an entire predictive cost model in a few hours. Now, I can chat with ChatGPT, explain what I want, discuss how we're going to do it, and it takes care of the doing. I do the thinking.

To me, this is undeniable proof against the usual approach to predicting AI's future. We forget that Jevons paradox exists. We picture a future where everyone is suddenly so much more productive while, for magical reasons, demand stays constant.

Yet history tells us again and again that demand is not constant — it's a function of price, which in turn is driven by increases in productivity. Productivity goes up, costs go down, demand goes up. The idea that AI is just going to make everything so much more productive and that new demand won't be created is a vision at odds with history.

Of course, demand growth is not infinite, and "too much productivity" eventually leads to much more supply than there is demand, and prices fall off a cliff — as happened in agriculture.

With all that said, my personal experience is that most, if not all, use cases where I've come to the conclusion that AI is really worth it are iterative, not automative.

Financial modeling. Coding. Creative work. Maths theorem proving. Consulting due diligence.

All those areas where AI is already making a huge impact are iterative workflows — areas where AI and humans work side by side on the problem.

Instead, in automation workflows — areas where AIs handle everything and humans are out of the loop — its impact is negligible. And yet: every six months, radiologists are six months away from disappearing, and there they are. Customer service should have been automated two years ago, allegedly, according to your average AI enthusiast prognosticator. And yet India's exports of customer representatives continue to grow, unchallenged, quarter after quarter.

Everyone thinks of AI as an automation tool. But currently, it has shown zero evidence of that. I hope I haven't ruined your day by telling you that, from my experience, this idea that agents are automating everything is a barefaced lie.

At this point, you may be thinking I'm a hypocrite, considering I've been predicting the arrival of agents for months now. But, for what it's worth, they are here — it's we who aren't ready for them.

This has led me to realize that the agentic revolution doesn't reach everyone equally. To make agents work, you need the right sources and tools. So you can either wait until the entire digital world has repurposed itself toward agents — or just build it yourself.

I'm in the latter group. Today, I'm going to show you how I see the agentic future actually looking — or how I call it: software 3.0.

Software 3.0 — Agent-First Architecture

Behind the paywall, I explain my approach to vibe coding, including the tools and strategies I use.

I show the finance app I'm building for my agent and explain in full detail why it is not an ordinary finance app and why it's actually built for agents first, not for humans. I explain its structure, what makes it materially different from every single piece of software you've ever seen, and why English is the most powerful tool in anyone's pocket — yes, no lines of code were written here. Fully programmable by the power of language.

Most software will be built this way. So if you're curious about how I pitch executives what that future looks like, this is it.

The Agent Bible, 2nd Part: Software 3.0 | TheWhiteBox by Nacho de Gregorio Transcript