Prompting has split into four distinct skills — most people only know the first one
Nate B Jones of AI News & Strategy Daily argues that the emergence of long-running autonomous AI agents in early 2026 has made traditional chat-based prompting insufficient, and lays out a four-level framework for what effective prompting now requires.
Summary
Nate B Jones argues that the release of models like Claude Opus 4.6, Gemini 3.1 Pro, and GPT-5.3 — capable of running autonomously for hours or days — has fundamentally changed what "prompting" means. The single skill most people practice, writing instructions in a chat window and iterating in real time, is now just the first of four distinct disciplines. Jones presents a full framework — Prompt Craft, Context Engineering, Intent Engineering, and Specification Engineering — and argues that each operates at a different altitude and time horizon, with each layer making the one above it possible. He draws on Shopify CEO Tobi Lütke's concept of "context engineering" and the Klarna AI case study to illustrate how misaligned intent at scale produces real organizational damage. His central claim is that the gap between people practicing 2025 prompting skills and those practicing 2026 prompting skills is already a 10x productivity difference — and widening.
Key Takeaways
FULL TRANSCRIPT
Why Prompting Has Changed Fundamentally in Early 2026
Nate B Jones: If you're prompting like it's last month, you're already too late. And I'm not just saying that for clickbait. If you haven't updated how you think about prompting since January 2026, you're already behind. Claude Opus 4.6, Gemini 3.1 Pro, and GPT-5.3 have all shipped in the past few weeks with autonomous agent capabilities that make the chat-based prompting most people are practicing functionally obsolete for serious work. These models don't just answer better. They work autonomously for a long time — for hours, for days — against specs without really checking in. That changes what "good at prompting" means on a fundamental level. And it's time to revisit how we think about prompting as a result.
Not because prompting stopped mattering — it actually matters more than ever — but because the word "prompting" is now hiding four completely different skill sets, and most people are only practicing one of them. The gap between the people who see all four and the people who don't is already 10x and widening.
In this piece, I'm going to lay out what those four skills are, why the distinction matters now, and exactly how to build the skills you're missing. This builds on my earlier work on intent engineering, but it goes way beyond it to lay out a full framework for how to think about prompting post-February 2026. Intent engineering is just one layer in a larger stack. This is the full stack for prompting post-February, post these new autonomous models.
What Changed: From Chat Partners to Autonomous Workers
First, what changed? The prompting skill that mattered since 2024 has been conversational. You sit in a chat window, you type a request, you read the output, you iterate, you get better at phrasing things, you provide examples, you structure instructions. If you're good at that — and if you've been following this video series, you probably are — you've been building real skills. They work. You're faster than you were a year ago.
But that fundamental chat-based skill has a ceiling. And in early 2026, a lot of people are hitting it, because the models have stopped being chat partners and started being workers. Workers that run for a long time. I'm not kidding when I say days and sometimes weeks.
The thing about a worker that runs for a long time is that everything you relied on in a conversation — your ability to catch mistakes in real time, your ability to provide missing context when the model asks, your ability to course-correct when things drift — all of that must be encoded before the agent starts. Not during the course of a conversation, but at the top. This is a fundamentally different skill. It's not a harder version of the same skill. It's actually different.
I've talked before about the importance of thinking about prompting — even in the chat window — as providing the relevant context for the LLM to give you an accurate response. But this goes way beyond that. If you're giving an agent a long-running task, which is where most of AI is going even if you're not a coder, then you have to think not about how to build for a chat response, but how to build for economically real work that this agent will do — and provide the agent the relevant context for that.
This shift is happening really quickly. Between October 2025 and January 2026, in just three months, the longest autonomous Claude Code sessions nearly doubled, and they've doubled again since then. Agents are running in the hundreds and thousands in production systems at major companies — and this is just from publicly available information. We have Telus reporting 13,000 custom AI solutions internally. We have Zapier reporting over 800 agents internally. Whenever a company releases a press release like that, you have to assume they feel behind and are releasing something to help themselves feel better. The companies that are really serious about AI don't feel the need for press releases and have an order of magnitude more agents. This is not about a world that is coming. This is about a world that has landed.
The 10x Gap: A Tale of Two Prompters on a Tuesday Morning
But this still might not feel concrete enough. So let me talk about a random Tuesday. Two people sit down with the same model, same subscription, same context window. The only difference is that one of them is using 2025 prompting skills and one of them is using 2026 prompting skills.
The 2025 person types a request — they're asking for a PowerPoint deck. They get back something that's about 80% correct. Maybe there are some formatting issues, some font collisions, some styling problems. They spend about 40 minutes cleaning it up, but they're pretty happy because this deck would have taken two or three hours. That's a 2025 prompting skill application. It would have been good in 2025.
Person B sits down with 2026 prompting skills. They write a structured specification in 11 minutes — they take longer to prompt. Then they hand it off to the same model, but they're thinking of it and using it as an autonomous agent. They go to make coffee. They come back to a completed PowerPoint that hits every quality bar defined up front. And they're able to do this for five other decks before lunch. In other words, they are now doing a week's worth of work in a morning easily. Same model, same Tuesday, 10x gap.
If you want to replicate this, you can replicate this experiment directly in Claude Opus 4.6 in the co-work model, which is available on Windows and Mac, and you can see exactly how this plays out.
This did not happen because the person with 2026 prompting skills is smarter or more technical. It's because she's practicing a different skill than person one — and person one doesn't know that kind of prompting skill exists.
Tobi Lütke and the Discipline of Context Engineering
I think it's worth paying attention to Shopify CEO Tobi Lütke here. Unlike most CEOs, Tobi is a technical person and he does not just engage with AI from a LinkedIn perspective. He has a folder of prompts that he runs against every new model release and he really deeply thinks about how new model releases change his workflow. He uses the term "context engineering" because he believes the fundamental skill we're all facing is the ability to state a problem with enough context that, without any additional pieces of information, the task becomes plausibly solvable.
I think that's a really elegant way to describe what person B did in the example I just showed you. Person B put all of the information the model needed to build a deck into one clearly defined task, and the model could just go to work.
This isn't about clever prompt tricks. This isn't about magical words that make AI produce better output. It's about a communication discipline. Can you state a problem so completely — with so much relevant surrounding information — that a capable system can solve it without going out and fetching more context? Can you make your request as self-contained as possible?
This is a really big deal because it demands a much higher bar for communication from us humans than we're used to. And that's something Tobi called out when he reflected on the impact of AI on his own leadership style. One of the things he mentioned is that by being forced to provide AI with complete context, he is now better at communicating as a CEO. His emails are tighter, his memos are better, his decision-making frameworks are stronger.
Tobi has gone farther than most people in thinking about the implications of context engineering. One of his most provocative assessments is that a lot of what people in big companies call politics is actually bad context engineering for humans. What he suggests is that good context engineering would surface disagreements about assumptions that are never surfaced explicitly but play out as politics and grudges in large companies. He says that happens because humans tend to be sloppy communicators who rely on shared context that doesn't actually exist.
I think that's a really interesting thesis. One of the implications of getting this February 2026 prompting lesson deeply ingrained in ourselves is that our human-to-human communication is likely to improve, and our organizations are likely to have cleaner decision-making and cleaner communication even between humans as a result.
The Four Disciplines: A Framework for Prompting in 2026
So here is the framework I would lay out to describe what prompting should be in February 2026. I've built it to be future-proof — looking at the direction agents are going and how they're developing this year, these four disciplines are going to matter even as agents continue to scale. This represents a significant update on how I've taught prompting before 2026. The way we prompted before 2026 was helpful as a foundation — you're not losing something by having learned it — but it's not enough as agents get more capable. I think we're due for a reset.
Fundamentally, prompting is the broad skill of providing input to AI systems so that they can do useful work. Prompting has diverged into four distinct disciplines, but it's not taught that way. Each of these disciplines operates at a different altitude and time horizon, and you need to understand them all to prompt well.
What I'm doing is taking intuitive knowledge that I see in excellent prompters and distilling it into four key disciplines that you can practice and learn from. These build on each other. If you skip one — and I'm presenting them in order — you're creating the kind of failures we tend to see at scale in the enterprise, but you're creating it for yourself in your own prompting.
Discipline One: Prompt Craft
Discipline one is Prompt Craft. This is the original skill. It's synchronous, session-based, and individual. You sit in front of a chat window, you write an instruction, you evaluate the output, then you iterate. The skill here is knowing how to structure a query.
You must have clear instructions. You must include relevant examples and counter-examples. You need to include appropriate guardrails. You need to include an explicit output format. And you should be very clear about how you resolve ambiguity and conflicts, so the model doesn't have to make it up on the fly.
This is what Anthropic's prompt engineering documentation covers. OpenAI talks about this. Google talks about this. It's on a thousand blog posts and LinkedIn courses. Prompt Craft has not become irrelevant — don't hear that. It's just become table stakes. It's the way knowing how to type with ten fingers was once a professional differentiator and now it's just assumed. If you can't write a clear, well-structured prompt in 2026, you're the person in 1998 who couldn't send an email. Is it important? Yes. Is it going to differentiate you in the workforce? Not really.
The key shift is that Prompt Craft was the whole game when AI interactions were synchronous and session-based. You wrote something, you got something back, and you refined it in real time. As a human interacting with that model, you were acting as the intent layer, the context layer, and the quality layer. That model of prompting broke the moment agents started running for hours without checking in.
Discipline Two: Context Engineering
Discipline two is Context Engineering. Anthropic published the foundational piece on this back in September 2025, but there's a lot of other good material out there as well. I define context engineering as the set of strategies for curating and maintaining the optimal set of tokens during an LLM task. That's a pretty commonly held definition.
Harrison Chase of LangChain was even blunter about what context engineering is during a recent Sequoia Capital interview when he said everything is context engineering — it actually describes everything they've done at LangChain without knowing the term existed. That's actually somewhat dangerous, because context engineering is only one of four levels, and people have misunderstood it to mean everything. One of the things I'm trying to move us toward is understanding context engineering as a specific skill: providing relevant tokens to the LLM for inference.
It is certainly foundational. It is certainly significant. It is where the industry's attention is focused today. It is the shift from crafting a single instruction to curating the entire information environment an agent operates within — all of the system prompts, all of the tool definitions, all of the retrieved documents, all of the message history, all of the memory systems, the MCP connections. The prompt you write might be 200 tokens. The context window it lands in might be a million. Your 200 tokens are 0.02% of what the model sees. The other 99.98% — that's context engineering.
This is the discipline that produces claude.md files, agent specifications, RAG pipeline design, memory architectures. It's the discipline that determines whether a coding agent understands your project's conventions, whether a research agent has access to the right documents, whether a customer service agent can retrieve relevant account history.
Anthropic's engineering team identified the core challenge precisely: LLMs degrade as you give them more information. The point is therefore to include relevant tokens, because the issue is not that they can't hold the tokens — it's that retrieval quality does drop as context grows.
The practical implication is that people who are 10x more effective with AI than their peers are not writing 10x better prompts. They're building 10x better context infrastructure. Their agents start each session with the right project files, the right conventions, the right constraints already loaded. The prompt itself can be relatively simple because the context does the heavy lifting.
Discipline Three: Intent Engineering
Discipline three is Intent Engineering. Context engineering tells agents what to know. Intent engineering tells agents what to want. It's the practice of encoding organizational purpose — your goals, your values, your trade-off hierarchies, your decision boundaries — into infrastructure that agents can act against.
The Klarna story is the proof case. Their AI agent resolved 2.3 million customer conversations in the first month, but it optimized for the wrong thing. It slashed resolution times but didn't optimize for customer satisfaction. As a result, Klarna got into big trouble, had to rehire a bunch of human agents, and is still dealing with the customer trust aftermath.
Intent engineering sits above context engineering the way strategy sits above tactics. You can have perfect context and terrible intent alignment. You cannot have good intent alignment without good context, though, because the agent needs information to act on the intent. These disciplines are cumulative.
Another thing worth noticing is that failure, as we progress up this hierarchy, gets more and more serious. When you as an individual screw up a prompt, it might waste your morning at worst. When you screw up context engineering or intent engineering, you are screwing up for the entire team, your entire org, your entire company. The stakes get higher. And because the stakes get higher, our attention to detail matters — and the value of the work we do increases commensurately. What I'm talking about when I talk about context engineering and intent engineering can be a full-time role at a big company. And if it's not, it is a high-stakes human skill that has a lot of transferable value.
Discipline Four: Specification Engineering
Level four is Specification Engineering. We're just starting to talk about this now, even though the best practitioners are already doing it. Specification engineering is the practice of writing documents across your organization that autonomous agents can execute against over extended time horizons without human intervention.
This is a level above everything I've described because all of the first three levels focused on how you prepare work directly for an agent. Specification engineering is really about thinking about your entire informational corpus in your organization as agent-fungible, agent-readable. Everything you write has to be something the agent can access and do something with. It's not really about prompting per se. It's not about an individual agent's context window. It's not even about the intent you've given agents. Specifications are complete, structured, internally consistent descriptions of what an output should be for a given task. They address how quality is measured. Specification engineering is a mindset you bring to your documents that allows you to apply agents across large swaths of your company's context with the confidence that what the agent reads is going to be relevant.
An interesting example from Anthropic comes from the team's struggles with the Opus 4.5 agent — one generation ago now. They were trying to build a production-quality web app. But if you give the agent only a high-level prompt like "build a clone of claude.ai," the agent tries to do too much at once, runs out of context mid-implementation, and leaves the next session guessing at what happened. The fix turned out not to be a better model. It was specification engineering — a pattern where an initial planner agent sets up the environment, a progress log documents what's been done, and a coding agent then makes incremental progress against a structured plan every session. The specification became the scaffolding that let multiple agents produce coherent output over days.
The shift from prompt to specification mirrors a transition that happened in human engineering decades ago. When you're building something small, verbal instructions and conversations work really well. When you're building something large enough to require a team or span multiple sessions, you need blueprints. Anthropic needed blueprints in the Opus 4.5 example. And even though we've now moved to Opus 4.6, the need for specification engineering has not gone down — it's gone up, because Opus 4.6 can do even more work. That's true for Codex 5.3. It's true for Gemini 3.1 Pro as well. The smarter models get, the better you need to get at specification engineering.
Which is why I deliberately started this section by zooming out and saying the entire org's document corpus should be viewed as a form of specification engineering. And yes, this is a fractal insight. You can also think about specification engineering for your individual agent task — what is the log that the agent has, how do we assign tasks across this agent build, how do we make sure the agent has a clearly specified requirements list to work from? But all of that gets way easier to put together if you think of your entire organizational document corpus as specifications that are agent-readable.
Your corporate strategy is a specification. Your product strategy is a specification. Your OKRs are a specification. Everything ends up being a specification that your agent can use. And that's different from context engineering, because the art of context engineering is really about shaping the context window in a way that's relevant for the agent.
If you look at these four levels: the prompt is you and the agent, crafting clear instructions. The context window is how you shape relevant tokens. Intent engineering is how you communicate goals and objectives to the agent that allow it to work autonomously for long periods in a direction consistent with company strategy. Specification engineering is how you think about your entire corporate document structure — the knowledge, the context that makes the corporation work — as a form of specification.
If you write good spec, if you have a good task log, if the agent understands what the spec is from the broader organizational context, they're less likely to go off the rails because of intent engineering conflicts. They're less likely to bloat out with bad context. All of these start to interplay. But the highest level is to think about specification as the way your organization does business. You specify the outputs you want. The agent does the work. The outputs are produced. That is the highest-level description of what business is going to look like in the next couple of years, and it starts with understanding how to specify.
This is where Anthropic's best practices documentation for Claude Code becomes really revealing. The recommended workflow for complex features is relatively simple: interview me in detail, ask about technical implementation, UI/UX, edge cases, concerns, and trade-offs, don't ask obvious questions, dig into the hard parts. The agent then writes the spec with the human. I think that is an artifact of this moment in time. I think we will get to a point where the agent will only be asking us about places where the broader specification corpus is in conflict or ambiguous, and we have to talk about what it means for this task to be accomplished — because the entire organizational infrastructure is going to be agent-readable.
The practical skill going forward is not writing code. It's not crafting prompts. It's the ability to describe an outcome with enough precision and completeness that an autonomous system can execute against it for days or weeks. That is a fundamentally different skill from writing a good prompt in a chat window.
The people who are excellent at one of these layers are not automatically excellent at all of them. Context engineers spend a lot of time thinking about how to compress tokens and get good tokens into context windows and keep bad tokens out. That is a different mindset from thinking about your information environment as agent-translatable, agent-readable, agent-fungible. We have to have all of these skills in order to effectively bring AI into the enterprise — or even into a small business — in 2026.
One-person businesses have the greatest advantage right now. If you are a one-person business and you can just convert your Notion workspace to be agent-readable, you're off to the races today. There's no gigantic effort required to make all of your SharePoint agent-readable. It's simple. You just get it into Notion and you're done.
Why Speed Matters and What's Coming
This comes back to the core idea that in 2026, speed is going to matter because agents are going to keep getting better quickly. What we have now as days and weeks is going to become weeks and months by the end of the year. The corresponding impact of getting specification engineering correct is going to be even higher. The corresponding impact of getting all four levels translated into specific roles — people who are responsible, DRIs, teams who handle this — that's going to be even more valuable in 2026.
If you are at a large company, you should have people who are doing context engineering and that's all they're doing. You should have people who are doing specification engineering and thinking about how agents can read the enterprise. You should have people who are thinking about intent engineering and how you translate goals into a set of objectives that an agent can read and value, and a set of verifiable guardrails the agent can follow.
The mental model most people carry is that prompting is good instructions for the AI — and that fails for a very specific reason. That entire model assumes synchronous interaction. In the synchronous AI-human partnership model, you're always there at the computer. You see the output in real time. You correct mistakes right away. You provide additional context when the model asks or when you notice it going off track. Long-running agents break every single assumption in that model.
If you've relied on the assumptions of synchronous prompting, you have a structural vulnerability in the way you think about AI. You need to start thinking about AI as if your real-time oversight is embedded in the specification before the agent begins to work.
The planner-worker architecture that's dominating production agent deployments reflects this reality. A capable model plans the work, decomposes it into subtasks, defines the acceptance criteria, and assigns work out — then cheaper, faster models do the work. The planning phase, which you could call the specification phase, determines the quality ceiling. Taking your specification and expanding it, enriching it, breaking it out, and planning against it — that's what determines the quality of the overall system. Execution without that specification step produces broken work that requires extensive human rework to be of any value at all.
The shift from fixing it in real time — which is what we do with a lot of prompting in the chat window — to getting the spec right up front changes your bottleneck skill. Real-time prompting rewards verbal fluency, quick iteration, and a good eye for output quality. Specification engineering rewards completeness of thinking, anticipation of edge cases, clear articulation of acceptance criteria, and the ability to decompose complicated outcomes into independently executable components.
Different people are good at these things in different ways. Some people are going to be naturally exceptional at synchronous prompting and will struggle with specification work. Some people will be mediocre at chat-based interaction but might actually be excellent spec engineers. My challenge to you is that you don't tolerate whatever your natural propensity is as your ceiling. Think of this as a learnable skill and go after it.
The Five Primitives of Good Specification
Now, if we're going after it, what are the foundational elements to learn? Specification is ironically very vague as a concept. So I want to define the primitives that go into good specifications in ways that are useful for learning. These are the foundation we need if we want to get better at specifying — and at the prompting skills that will matter in 2026 and beyond.
Primitive one: Self-contained problem statements. This is Tobi's insight, but it's not only his. Can you state a problem with enough context that the task is plausibly solvable without the agent going out and getting more information? The discipline of self-containment forces you to be clear. It surfaces hidden assumptions. It makes you articulate constraints you normally leave implicit because you trust the human on the other end to fill in the gaps. AI doesn't fill in gaps reliably. It fills them with statistical plausibility — which is a polite way of saying it guesses in ways that are often subtly wrong.
If you're trying to train this primitive, take a request you would normally make conversationally — like "update the dashboard to show the Q3 numbers" — and rewrite it as if the person receiving it has never seen your dashboard, doesn't know what Q3 means in your org context, doesn't know what database to query, and has no access to any information other than what you include. That is the level of self-containment you should be challenging yourself with.
Primitive two: Acceptance criteria. If you can't describe what done looks like, an agent can't know when to stop — or more precisely, it will stop at whatever point its internal heuristics say the task is complete, which may bear no relationship to what you needed. This is why the 80% problem is a big issue for agent system design.
A specification that says "build a login page" should instead say "build a login page that handles email and password, social OAuth via Google and GitHub, progressive disclosure of 2FA, session persistence for 30 days, and rate limiting after five failed attempts." For every task you delegate, write three sentences that an independent observer could use to verify the output without asking you any questions whatsoever. If you can't write those sentences, you probably do not understand the task well enough to give it to an agent. I have had that happen — I've been in a conversation with an AI agent and realized I don't know enough to delegate the task and have to come back later. That's okay. It's good to realize that before you assign the work.
Primitive three: Constraint architecture. What the agent has to do, what the agent cannot do, what the agent should prefer when multiple valid approaches exist, and what the agent should escalate rather than decide autonomously. These four categories — the musts, the must-nots, the preferences, and the escalation triggers — form the constraint architecture that turns a loose specification into a reliable one.
The claude.md pattern emerging in the coding community is a practical implementation of constraint architecture. The best claude.md files are not long lists of rules. They're concise, extremely high-signal constraint documents. Use these build commands. Follow these code conventions. Run these tests before marking a task complete. Never modify these files without explicit instructions. The community consensus is very strongly that every line in a claude.md file needs to earn its place. If you ask "would removing this line cause the AI to make mistakes?" and the answer is no, then kill the line.
If you want to train this primitive, before delegating a task write down what a smart, well-intentioned person might do that would technically satisfy the request but produce the wrong outcome. Those failure modes end up being your constraint architecture. Encode them.
Primitive four: Decomposition. Large tasks need to be broken into components that can be executed independently, tested independently, and integrated predictably. This is software engineering's oldest lesson — modularity — but applied to AI task delegation. Anthropic's long-running agent harness splits every complex project into an environment setup phase, a progress documentation phase, and an incremental coding session, each independently verifiable. You get similar task decomposition automatically inside Codex.
A marketing content audit requires the same decomposition as a coding task — I'm not just talking to engineers here. You would decompose your marketing content into quality scoring, gap analysis, recommendation generation, and so on. If you want to train on this primitive, take any project you would estimate at a few days of work and decompose it into subtasks that each take less than two hours, have clear input-output boundaries, and can be verified independently of the other tasks. That is the granularity at which agents work best and at which specification engineering tends to operate.
In 2026, you do not have to pre-specify all of those two-hour tasks when writing a prompt, but you do have to understand what all of those tasks are. You have to understand how to describe for a planner agent what done looks like and what decomposable pieces look like, in such a way that the planner agent can reliably break the work into 50 or 60 subtasks. Your job increasingly is not to manually write the subtasks for the agent. Your job is to provide the break patterns that a planner agent can use to break up larger work in a reliable, executable fashion. That's a level of abstraction even above decomposition, and that is a lot of where we're going as agents start to run.
Primitive five: Evaluation design. This is critical not just at an individual level but at an org level. Organizations need to think about every level of AI deployment in terms of eval. How do you know the output is good? Not "does it look reasonable" — which is how most people evaluate AI output — but can you prove, measurably and consistently, that this is good?
If Prompt Craft is the art of the input, evaluation design is the art of knowing whether that input worked. In a world where agents can run for a really long time, eval design is the only thing standing between AI-generated output I can't use and AI-generated output we can use as-is. For every recurring AI task in your world, build an eval — build three to five test cases with known good outputs and run them periodically, especially after model updates. This will catch regressions. It will build your intuition for where models fail. It will create institutional knowledge about what good looks like for your specific use cases, your team, your org. You need to be doing this systematically.
Where to Start: A Practical Learning Path
If you're wondering where to start, I gave you those four layers in order for a reason.
Start by closing the Prompt Craft gap. Most people are worse at basic prompting than they think. You should be rereading prompting documentation. You should do interactive tutorials. You should be building a folder of tasks you do regularly, writing your best prompt against each one, saving the outputs as your baseline, and revisiting them over time. Take Prompt Craft seriously.
Second, once you start to have a handle on that, build your personal context layer. Write a claude.md equivalent for your work. I don't care if you use Claude — you still need to have your goals, your constraints, your communication preferences, your quality standards, and the institutional context that a new team member would need six months to absorb, written down. Start AI sessions by loading this context. The difference in output quality should be immediate and obvious.
Then get into specification engineering. Take a real project — not a toy problem — and write a specification for it.
Then start to get into intent infrastructure. This is an organizational layer. If you manage people or systems, start encoding the decision frameworks your team uses implicitly. If you are an individual contributor, encode the decision frameworks you understand and try to be a champion that pushes for this at the organizational level. A lot of teams like to talk about adopting AI. Talk about it in terms of building intent infrastructure. Talk about what "good enough" looks like for each category of work. Talk about what gets escalated by AI versus what AI can decide. Write it down, structure it, make it available to agents.
And practice specification engineering. Take a real project, not a toy problem. Write a spec for it before touching AI. Address acceptance criteria, constraint architectures, decomposition. Hand that spec to an agent and see what comes back. And from an org perspective, start to think about every document you touch as a spec that the agent will need to read and operate against. Your org is a system of business processes, even if you're a team of one. Those business processes should be agent-readable and they should be specifiable.
This has some of the downstream implications that Tobi talked about — that a lot of organizational politics is just bad context engineering at the org level. If we practice better specification engineering for our documents, we will expose a lot of the implicit assumptions that we end up being political about inside organizations, and we will start to make those agent-readable. We will start to have fewer issues. Practicing specification engineering is a way for us to clearly describe intent at organizational scale and clearly translate that intent in a way that agents can read it. And yes, it nests down to individual agent runs and it ladders up to the full organizational context. That's why it's the last and most difficult skill to learn.
The Stack, Not the Ladder
The progression from Prompt Craft all the way up to Specification Engineering is not a ladder where you can abandon lower rungs. It's a stack where each layer makes the layers above it possible. You cannot write good spec if you can't write good prompts. You can't build effective agent systems if you don't understand context engineering. You can't align agent behavior with organizational goals without understanding how intent works and how that plays into context engineering. They all go together.
Beyond AI: What This Means for Human Communication
There's a final dimension to this that goes beyond AI, and I want to spend some time on it. I hinted at it when I talked about Tobi finding that he communicated better when he got better at prompting. The best human managers I've worked with already operate with that degree of clarity. They give complete context when they delegate. They specify acceptance criteria to their team members. They articulate constraints. They're effectively following the four disciplines of AI input with their people. And that makes for effective leadership.
What's happening right now, if we step back, is that AI is enforcing a communication discipline that the best leaders have always practiced intuitively — and now everyone needs it in order to be effective. You cannot just rely on shared context with the machine. You cannot just assume that AI will know. And that is something that is a gift to us, because so many of our colleagues don't know what we mean either. How many times have you sat in a meeting where someone is referring to a document and you don't know what that document is and you're afraid to ask? That is a wonderful example of the kind of poor communication quality that goes into human meetings.
This is not a framing you'll see in a lot of how-to-prompt courses. I think it should be. The skill of providing high-quality input to intelligent systems turns out to be translatable for AIs and for humans alike. It turns out to be a fundamental skill of the agent age that benefits us as humans and how we work together.
The people who develop this collection of skills around prompting for 2026 are going to end up being the leaders who run organizations where agents and humans both perform at their ceilings. The people who are stuck in 2025 prompting skills are going to wonder why their AI investments keep producing partial value — and meanwhile, their human teams keep having alignment issues.
The prompt by itself is dead. The specification, the context, the organizational intent — that is where the value in prompting is moving, because agents are starting to work for longer and longer periods and look, in a lot of ways, like junior employees. The specification done right turns out to be just what clear thinking has always looked like, really made explicit — because machines don't let us be lazy about it. And I'm really excited for the way that kind of communication clarity can clean up our organizations and our human-to-human communication as well.