Podcast transcripts, polished for reading

Manus AI: What Manus Tells Us About the Future of AI Agents | AI News & Strategy Daily | Nate B Jones Transcript

Polished transcript · AI News & Strategy Daily | Nate B Jones · 2 Sept 2025 · 27m · @maverick

Manus AI as a window into the future of autonomous AI agents

Nate B Jones analyzes Manus AI's development, proposes a framework for evaluating AI agents, and identifies where autonomous multi-agent tools fit in the current landscape.

Summary

Nate B Jones explains why he delayed covering Manus AI after its March 2025 launch — early reliability issues, unpredictable costs, and token consumption problems made it premature to recommend. With the platform now stabilizing, he uses Manus as a lens to examine the broader challenge of categorizing and evaluating AI agents, proposing an original framework he calls MACE (Modality, Autonomy, Complexity, Environment). He argues that Manus represents a distinct and technically demanding category — the autonomous multi-agent orchestrator — and that most comparisons to other "agents" are inappropriate because the underlying architectures differ so significantly. He identifies five practical use cases where Manus delivers clear ROI today, and argues that Manus is effectively a preview of where major model makers are heading.

Key Takeaways

  • The MACE framework offers a new vocabulary for evaluating AI agents across four dimensions — Modality, Autonomy, Complexity, and Environment — because the field currently lacks precise language for comparing tools that are called "agents" but operate very differently.
  • Manus belongs in the autonomous execution agent category, alongside Devin, and should not be compared to ChatGPT's agent mode, which Jones argues is a fundamentally different architecture operating in a different capability tier entirely.
  • The classic engineering trilemma applies directly to Manus: it has optimized for reliability and capability, which means cost remains high and unpredictable — a deliberate trade-off, not a failure, and one that explains its current positioning as a specialist tool.
  • Enterprise scaling of multi-agent orchestration involves a set of hard, underappreciated problems — including state management across sub-agents, context bleed between modalities, error propagation, memory management at scale, and interpreting ambiguous user intent — none of which have obvious solutions.
  • Manus's current sweet spot is high-value, complex, multi-domain tasks worth $500–$5,000 if done manually: industry research reports, content marketing pipelines, data analysis for non-technical teams, process documentation, and technical proof-of-concept development.
  • The success pattern for autonomous agents in late 2025 requires clear economic justification, complex workflows with 5–25 distinct actions, human review built into the process, and an expectation of an excellent first draft rather than a finished product.
  • Manus is following a classic startup platform evolution — demo phase, early access with edge-case discovery, stabilization, then optimization — and is currently transitioning from the stabilization to the optimization stage, targeting indie builders and small teams before the enterprise.
  • Jones predicts a major model maker will launch a Manus-equivalent within months, driven by the economics of specialist high-value tasks that justify premium pricing on top of existing subscriptions, and notes that Manus is currently showing the industry what that product category looks like.
  • FULL TRANSCRIPT

    Why Manus AI Was Worth Waiting to Cover

    Nate B Jones: Manus AI launched in March of 2025 and I didn't talk about it very much. The reason is that it was another one of those cases — like Devin — where the hype video ran way ahead of what people were actually able to do in practice. Reddit forums filled up, Twitter complaint conversations started, and the long and short of it was that after the March launch through about June or July, there were a lot of issues with reliability, with cost, and with token consumption clarity. That is starting to shift. It is shifting enough, and the platform is stabilizing enough, that I think it's worth having a wider conversation.

    But before we do that, I want to talk about what I think is actually one of the key challenges when we think and talk about AI — and agents in particular: naming things. It is really hard to name an AI capability because AI is such a slippery technology. It's general purpose. It can do anything. And so naming and categorizing what these different tools do becomes both really important to get work done, and also not at all obvious. It's not clear.

    Introducing the MACE Framework

    Before we dive into the capabilities of Manus itself — and why I think the platform is stabilizing, and what use cases you can apply it to — I want to take a moment to talk about a proposed framework for how we assess agentic AI tools. As far as I know, we haven't really had a good framework for this. That's why I'm proposing one. I want to go through it. Tell me where it's wrong. Tell me where it's better. Let's dive in.

    I'm calling this the MACE framework. MACE stands for Modality, Autonomy, Complexity, and Environment. I think those four dimensions are all things we need to assess agentic AI tools on, and that we've really lacked the language for previously.

    Modality — what is the primary modality of this tool? There are at least five things you can look at. First, text agents: Claude, ChatGPT, Gemini — they generate and analyze text. Second, coding agents: Cursor, GitHub Copilot, Claude Artifacts. Third, workflow agents: n8n, Zapier, Make, LangChain, and so on. Fourth, research agents like Deep Research or Perplexity. And fifth, multimodal agents — Manus falls into that category. There are probably other primary modalities, but you get the idea. What is the primary mode of this agent? That becomes a relevant thing to establish.

    Autonomy — what is the degree of proactive autonomy this agent brings? It can be reactive, responding to individual prompts like Claude or ChatGPT in a text window. It can be interactive — multi-turn with human guidance, which you sometimes see when Deep Research comes back and asks you a clarifying question. It can be semi-autonomous, executing plans with checkpoints — GitHub Copilot Workspace is an example, where it will check in with you along the way. Or it can be fully autonomous, end-to-end execution with minimal intervention. Manus and Devin are both in that category.

    Complexity — what level of complexity can this agent handle? Some non-reasoning models with Claude and ChatGPT handle simple tasks step by step. Claude Code is a good example of sequential multi-step. More capable systems handle branching — good n8n workflows do that. And then there is dynamic replanning based on results. Manus does that, and more advanced agent configurations can do it as well. You can set up Claude with multiple agents to do that in Claude Code, for example.

    Environment — what is the execution environment? Is it cloud-contained, running in the provider's sandbox? Both Claude and Manus do that in their application interfaces. Is it integrated into your IDE, like Cursor? Is it platform-hosted with a dedicated agent runtime — n8n can be configured that way? Or is it infrastructure-spanning, able to deploy or access different external systems and use complex tools? Manus can do that. You can configure Claude Code to do that as well.

    Six Practical Categories of AI Agents Today

    When you look across the MACE dimensions, it becomes possible to identify at least six practical categories of AI agents that exist today and fit within this broader spectrum.

    The first and simplest is conversational generators — ChatGPT, Claude, Gemini, DeepSeek. You use them when you need high-quality text generation fundamentally.

    The second class is coding assistants. When you need to write code and you have a feedback loop for it, Claude Code is a great example. Cursor does this, Windsurf does this. You can't use these when you need broader system orchestration unless you configure them specially. Claude Code is something of an exception because it's such a malleable tool — that's why it appears in more than one of these categories — but code assistant is the good vanilla use case for it.

    Third, workflow orchestrators. n8n, Zapier, and Make all fall here. You're connecting known systems, you have predictable data flows. These systems can struggle with ambiguous inputs and tend to be somewhat brittle.

    Fourth, research synthesizer agents. Deep Research works here. Perplexity has a deep research function, and you can also use Claude in a deep research configuration — put Opus 4.1 on it, have it search the web and think hard. You need current information compiled, analyzed, and acted on. Typically I find that the acting part is the problem. If you need to actually take an action rather than just read, don't use these. But if you need to develop very high-quality information, research synthesizers are really, really good, and people are using them heavily for those use cases.

    Fifth, autonomous execution agents. Manus and Devin obviously go here, along with custom agents that work the same way. There are people running Claude Code continuously in configurations that make it an autonomous execution agent. More and more energy is going into this category. That's part of why I've called out Manus — because I think it is a flagship pointing toward a wider future of autonomous AI execution, and it is worth paying attention to on that basis. The world is going to look more like Manus in the future. The challenge is managing the cost and managing the complexity. You have to know what kinds of tasks you want to entrust to an agent that complex.

    Sixth, hybrid collaboration. There are a lot of tools where you want the agent to come back and talk to you, to engage with you. Cursor Composer is a great example — there's some degree of human judgment alongside AI capability. Andrej Karpathy has done a great job talking about that nuanced human collaboration piece that happens with good agent workflows. One of the things he emphasized in a tweet a few weeks ago is that as we build these AI agents, probably too much focus right now is going into bucket five — autonomous execution — and we are sometimes missing the realization that we need to create the right moment for the human to touch the model or touch the work. Humans can bring tremendous value, especially seasoned, experienced humans with domain knowledge, and it is critical to give humans the space to do that.

    Why Naming and Categorization Matter

    Those are six examples. You've got the MACE framework in your head. We've talked about how these different agents bucket together. I hope you've gotten a better sense of the landscape.

    I think we need to have more of these conversations around how we bucket these intelligences. One of the things that really needs to happen is that we need some degree of tagging that goes with these names. Claude Code is a great example — it doesn't just code. It does a lot more than code, but it was named Claude Code. Manus happens to write code. It also runs it. It also continues the workflow. Calling it a general-purpose agent is fine, but it would be more precise to describe it as a multi-agent orchestrator. I know that's a bit of a mouthful, but the precise wording helps us know what agents to compare things to — because otherwise we end up making inappropriate comparisons.

    I would not compare the agent mode that ChatGPT shipped with Manus. Those are different architectures with different capabilities. Manus is a whole lot better than the agent mode ChatGPT shipped, and it's not close. I'm not even sure they're playing in the same ballpark, even though both are called agents.

    The Hard Engineering Challenges of Scaling Multi-Agent Orchestration

    So if I were to think about the challenges associated with stabilizing these technologies into reliable forms that companies can actually access — part of why I've done all this naming work before getting into Manus specifically is that organizations need some predictability to purchase, and delivering that predictability with a technology like AI is actually quite challenging.

    You have to solve complexity of orchestration. You have to solve state management across modalities. If you have different sub-agents and you're trying to sell this as a bundle the way Manus is, you have to be able to show that each sub-agent can maintain its own state, but the orchestrator needs to have global coherence — because the enterprise will expect that. You have to show that state complexity can be maintained despite task length and modality extending.

    Tool selection is another example. When the agent is uncertain, how can you show the enterprise what tool choice it will execute on? What does the fallback look like? What does the error handling look like?

    Memory management and context is another big piece. How do you handle long workflows that accumulate enormous context? One of the biggest challenges in AI right now is that enterprise businesses bring enterprise-scale context, and it's very difficult to bring that to AI in a way that's reliable and scalable. You can't just truncate — you might lose dependencies. You have to figure out how you handle external memory, how you handle summarization, and it's not entirely intuitive how to do that at enterprise scale.

    Cross-modal context and avoiding context bleed is another challenge. Code outputs might need to inform text generation for a complex task, but you have to make sure they have different context requirements and different token economics — so you're not spending code tokens on text generation if code is more expensive, and you're not leaking requirements back and forth between the two.

    Error propagation is another hard problem. How do you avoid an error loop when one sub-agent fails? What does an error recovery decision tree look like that an enterprise can audit and understand?

    Resource predictability is another major one, and this has been one of the chief complaints about Manus. How do you predict what it's going to cost when you're paying in credits? When a credit is burned, is it the same value for every action? People have complained that some days Manus seems to burn more credits and some days less, and it's not predictable. Overall it has gotten much better since March — that's part of why I'm talking about it now — but it isn't yet at the degree of enterprise predictability it needs to be.

    QA is another massive challenge. How do you validate code — and not just code, but engineering configurations — when the LLM designs all of it with multi-agent orchestration? That is really hard. It's one of the reasons Manus is more popular right now with consultants and independent builders than with enterprises.

    Last but not least, user intent and model coordination. It is really difficult to handle different model results consistently over time when you have different sub-agents and some of them are from different models. That is not an easy task, but it's one that many of these builders are trying to handle under the covers because of the unit economics associated with token burn. There was an article — I believe it mentioned Notion — where it said roughly ten percentage points of Notion's margin had been eaten up in the last year simply because Notion is using AI models. AI model makers are starting to eat SaaS margins. If you want to combat that, you need a multi-agent configuration — but your multi-agent configuration needs to actually work, and that gets harder when you factor in user intent.

    How do you handle user intent when users are not intentful? When they aren't clear about what they want? At the enterprise level, you're going to have engineers who are very precise on one end, and on the other end you're going to have people who just say "make it good" or "make a dashboard." How do you interpret that in a way that is compliant with privacy, able to handle all the challenges that come with building a fully fledged product, and in line with the user's presumed intent? How do you handle pushback and questions?

    Everything I've just described — all of these scaling challenges associated with multi-agent orchestrators like Manus — explains why they're hard to scale to the enterprise. And I didn't even get to the technical scaling part: actually scaling out the system so it serves enterprise workloads. That's another challenge entirely.

    Where Manus Sits Today and Why It Makes Sense

    Why am I going over all the hard things? All of this explains the challenge that Manus is trying to solve, why I believe it's important to talk about, and why Manus's current position makes sense.

    At the end of the day, Manus is trying to get to a point where they can scale multi-agent orchestration for the enterprise. But to do that, they're running the classic startup playbook — starting with indie builders, starting with small startups, gaining the experience they need, and then moving into the enterprise space. They are trying to solve all of these problems in ways that are transparent to the user and that enable the user to deliver value specifically where Manus is good.

    Manus's current position is pretty simple. They've chosen to optimize for reliability and capability. That explains the cost issue. The old engineering dilemma is that you can't optimize for reliability, capability, and cost all at once. You pick two out of three. You can be reliable and capable, but you're not going to be cheap. You can be reliable and cheap, but you're not going to be fast. You can't have all three.

    In a sense, Manus has one of the most transparent pricing systems in the business — when the tokens run out, you just buy more, and they can allocate compute to the people willing to pay.

    They're also following a very typical platform evolution pattern. There was a demo phase in March. There was early access roughly from April to June, during which people found edge cases and reliability issues — right on schedule. They're now stabilizing. They've fixed a number of those problems. It's not perfect, but it's good enough to start talking about. From here, they'll be optimizing and scaling into the second half of this year.

    The fundamental tension they're operating within is this: users want ChatGPT simplicity, autonomous execution, and predictable costs. They can't have all three. What you're getting is complicated workflows, autonomous execution, and variable costs. This core tension explains why Manus remains in the expensive specialist tool category rather than a mainstream app. Solving for the engineering challenges that would enable ChatGPT simplicity, predictable cost, and autonomous execution simultaneously is non-trivial.

    Want an example of why it's non-trivial? Nobody else has launched a real competitor to Manus from among the major model makers. ChatGPT's agent mode doesn't match it. Claude Code is in a separate category — I'd argue it's not the same thing. Google hasn't launched something equivalent. Manus is its own thing. Part of why is that the engineering challenges they're solving are really, really tough.

    Practical Use Cases Where Manus Delivers Today

    So — we've spent a lot of time on the framework, on the categories, on the scaling challenges. Now let's talk about where Manus is actually useful in September 2025.

    Use case one: high-value research and analysis. Monthly or quarterly industry analysis for executives, competitive intelligence briefings, due diligence research packages. Manus wins here because the cost is justifiable. If it costs a hundred dollars to develop that report, it's a lot cheaper than two thousand dollars for a consultant. It combines web research, nicely formatted output, and data analysis. Human review is expected anyway before strategic decisions are made, so it's not too risky. And the time savings can be enormous — if it takes two hours to produce a report that would otherwise take days, the ROI is obvious.

    Use case two: content marketing production pipelines. If you're managing multiple clients as a small agency, or you're a SaaS company with regular content needs, Manus can scale content production without a linear cost increase. It handles research, analysis, creation, and formatting. The quality bar is "excellent first draft" rather than "publication ready," and the ROI is clear because the alternative is hiring content writers.

    Use case three: data analysis and visualization for non-technical teams. Business analysts without coding skills, marketing teams analyzing campaign performance, small businesses that need ad hoc analysis. Manus eliminates the need to learn Python or R, or to hire a data scientist. It handles messy data, the analysis, and the visualization. I can think of other tools that handle parts of that, but I don't know of any tool that handles all of those parts besides Manus. Output quality will often exceed Excel-based manual analysis, and time to insight is reduced. With truly large enterprise data sets this won't work, and I'm not going to pretend it will — which is why I emphasize the small business use case.

    Use case four: process documentation. Operations teams documenting workflows, consultants analyzing client processes, creating training materials. Manus can map existing processes, identify opportunities, and create documentation on the fly — very fast. It can save weeks of manual process scraping and provide immediately actionable recommendations in a nicely visualized format. Not a risky use case, and it saves a ton of time.

    Use case five: technical proof-of-concept development. Validating a product idea, exploring a new integration, creating technical specs as a PM. This goes beyond what you'd get with a tool like Lovable because Manus can create the prototype, the documentation, and the deployment in one big workflow. It can handle multiple technical domains, and the goal is speed to a working proof of concept rather than production-ready code.

    The Success Pattern for Autonomous Agents in Late 2025

    You see the same pattern across all of these use cases, and I want to call it out explicitly.

    There needs to be economic justification. All of the tasks I've described cost $500 to $5,000 if done manually — often in the thousands. The Manus cost is going to be a fraction of that, a tenth or less. The time savings stretch into days typically, and the quality expectation is "make a fantastic first draft," not a perfect final product. Those are the sweet spots for independent agents in the fall of 2025.

    These are also technically complex workflows — five to fifteen to twenty-five distinct actions, combining research with creation with formatting. They have human review and refinement built in, and they have very clear deliverables.

    The thing to call out is that if you have an agent that excels at complex multi-domain workflows where the alternative is hiring expensive specialists, what you have is still a premium automation tool rather than a general productivity app. We keep coming back to this idea that certain buckets in the agentic landscape are more specialized than others, and where you're paying more as a result.

    AI agents should not be viewed as a singular bucket anymore. You have agents positioned as general productivity tools and agents positioned as specialist tools for specialist tasks. Manus, as it stabilizes, is looking more and more like a specialist tool — a surgeon's scalpel versus a Swiss Army knife. I'm sure they would like to be a Swiss Army knife from an economics perspective, but because of the engineering challenges I've identified, it's a really hard position for a multi-agent orchestrator to occupy.

    Why Manus Points to Where the Market Is Going

    That said, I want to end by talking about why this matters beyond Manus itself — because I think that is where the market is going, and that is why it's worth talking about Manus at all.

    Manus is like the canary in the coal mine. They're the ones showing us the way forward on multi-agent orchestration — what it looks like independently, how you can start to stabilize a product even for smaller businesses and independent builders, or for teams within larger businesses that don't have heavy data needs. But they don't have the scale of the major model makers, and they haven't been able to build out the kind of footprint that would enable them to really harvest unit economic gains and bring down the cost curve.

    That's where I think we're going. I think it is likely that we will see a version of Manus from a major model maker in the next few months — maybe from Google, maybe from Anthropic, maybe from OpenAI. The value that people see with these complex use cases is very high. If you were spending three, four, five thousand dollars on this kind of work, you're going to be willing to pay whatever it costs to get it done with an AI agent, because it's so much cheaper — a tenth, a fifth of the price.

    And if you are a major model maker looking to recover some of the cost associated with these models, you want more reasons for people to pay you more. Maybe this is the reason you have a $200 ad hoc task on top of a $200 subscription. Some people will pay for that because it's so good. You're going to see a lot more of the economists at these major model makers — and yes, they have economists — looking for exactly these kinds of specialist tasks that enable them to scale margin. That's where Manus is showing the way.

    So if you have specialized tasks — the kind of $500 to $5,000 work I've described — and you know you need to get that task done no matter what, maybe try Manus. It will probably save you a fair bit of money, and you'll be willing to pay the cost because the ROI is so clear, especially if you're looking for an excellent first draft.

    That's my verdict on Manus. I waited to talk about it until it started to stabilize. I feel excited to talk about it now. I think it's a great example of how AI agents are developing specialized use cases in fall 2025, and I'm excited to see where Manus goes next.


    Polished transcript of AI News & Strategy Daily | Nate B Jones. All views are those of the original speakers. Watch on YouTube ↗
    Published by @maverick
    More from AI News & Strategy Daily | Nate B Jones
    More from @maverick
    Summary