Podcast transcripts, polished for reading

Your Claude Limit Burns In 90 Minutes Because Of One ChatGPT Habit. | AI News & Strategy Daily | Nate B Jones Transcript

Polished transcript · AI News & Strategy Daily | Nate B Jones · 2 Apr 2026 · 26m · @maverick

Nate B Jones explains how to stop wasting AI tokens and reduce costs dramatically

A solo presentation by Nate B Jones on token efficiency across all major AI platforms.

Summary

Nate B Jones argues that most AI users — from beginners to advanced developers — are burning far more tokens than necessary, not because the models are expensive, but because of specific, fixable habits. He walks through the most common token-wasting behaviors he has observed across skill levels: raw PDF ingestion, conversation sprawl, indiscriminate use of the most expensive models, unaudited plugin loading, and inefficient web search. He illustrates the cost difference with a concrete example showing an 8–10x reduction in compute cost for the same output when clean habits are applied. He also warns that upcoming frontier models — including Claude Mythos — are expected to be significantly more expensive, meaning today's sloppy habits will become far more costly. The video closes with five "commandments" for token-efficient agent design and an introduction to a diagnostic tool he calls the "stupid button."

Key Takeaways

  • Raw PDF ingestion is the most common beginner mistake. A 4,500-word document fed as a raw PDF can consume over 100,000 tokens due to formatting overhead, compared to 4,000–6,000 tokens when converted to markdown first — a 20x difference that compounds with every conversation turn.
  • Conversation sprawl silently inflates costs. Every reply in a long conversation causes the entire conversation history to be re-sent to the model. After 20–40 turns, the original instructions are diluted and token waste is severe. Starting fresh every 10–15 turns and summarizing before moving on is a simple fix.
  • Unaudited plugins are a hidden tax. One user Jones knows loads over 50,000 tokens into their context window before typing a single word, purely from accumulated plugins and connectors. Auditing and removing unused integrations is a high-impact, low-effort optimization.
  • Model selection should match the task. Using the most expensive model — Opus, GPT-4.5, or equivalent — for formatting, proofreading, or simple tasks is wasteful. Jones recommends using premium models for reasoning, mid-tier models for execution, and lightweight models for polish.
  • The cost difference between sloppy and clean usage is 8–10x. Jones calculates that a messy five-hour session on Opus can cost $8–$10 in compute, while the same output achieved with clean habits costs around $1. Across a ten-person team on an API, that translates to $2,000 per month versus $250 per month.
  • Prompt caching offers a 90% discount on repeated content. Cache hits on Opus cost $0.50 per million tokens versus $5.00 standard. System prompts, tool definitions, and stable reference documents should always be cached — Jones describes ignoring this as "pouring money down the drain."
  • Upcoming models like Claude Mythos are expected to be significantly more expensive. Jones anticipates a new pricing tier potentially 10x above current Opus rates by April or May. Habits that are tolerable today will become serious cost liabilities at that price point.
  • Agent pipelines require architectural discipline. Jones outlines five principles for agents: index references rather than dumping full documents, pre-process context before it enters the window, cache stable context, scope each agent's context to the minimum it needs, and instrument every call to measure actual token cost.
  • Token efficiency is becoming a professional skill. As Jensen Huang has cited $250,000 per year in token costs per developer as a realistic figure, Jones argues that managing token usage intelligently is no longer optional — it is a job-level competency that cannot be delegated.

  • FULL TRANSCRIPT

    The coming cost increase and why token efficiency matters now

    Nate B Jones: The next generation of models is likely to drop in the next one to two months. I'm talking about Claude Mythos. I'm talking about whatever ChatGPT drops next. I'm talking about the next Gemini model. They will be more expensive — a lot more expensive — because they're all trained on much more expensive chips, the GB300 series from Nvidia, and it's just going to get more expensive from there.

    The intelligence we're going to get, the ambient compute all around us that is essentially free intelligence, is going to be the dumber models. That's just how it is. If you want to use cutting-edge models, you have got to stop burning tokens and blaming the model. And that is the theme for this video.

    If you're in a position where you're wondering how much token usage you have, or how expensive your AI is, or whether you're using too many tokens, or how you can even measure that — how you can get better at it — that is what this is. And that is going to be one of the most valuable skills on the planet, by the way, because you do not want to be in a position where you are putting $250,000 a year — a real number that Jensen Huang gave in a real interview for what he expects an actual individual engineer to spend in a year on tokens — you don't want to be the person spending $250,000 on tokens you don't have to be spending on. You want to be smart.

    I am going to give you a specific example. This is a real-life example. A real person I know gave me permission to use this. I recently saw a production AI pipeline that ingests multiple long-form conversations per user, runs an analysis across dozens of dimensions, and generates a fully personalized output — all on the most expensive models that money can buy. Not because the person wants to use expensive models, but because he tested it and what he found was that the better models produce the results he needs for this business. The cost per user: less than a quarter. Less than 25 cents per user for that.

    Most of us are spending more than we need to on AI, and this is a video about that. You can be really smart, use really good cutting-edge AI, and you can be intelligent with your token usage and not spend a ton of money. If you want to know what that's like, keep watching, because we're going to get into specific strategies, and I'm going to show you what I built so that we can actually make this easier for everybody so it's not just a guessing game anymore.

    The takeaway is that frontier AI can be absurdly cheap when you know what you're doing. Essentially, the models are not expensive. It's your habits that cost a lot. And with Claude usage limits dominating everything in the last week, I think it's worth having that conversation. So let's get to it.

    Beginner mistake: raw document ingestion

    Nate B Jones: I've made the case we can use our models better. What are the specific habits we can change? I want to name specific habits that I have seen in conversations with others, looking over shoulders, reading GitHub repos, listening to conversations online. These are specific examples that are patterns I see over and over again.

    The first one is for the rookies — the folks who are new to cutting-edge AI. You know what you bleed out on in tokens? You bleed out on document ingestion. This one drives me crazy because it's so easy to fix. A brand new Claude Desktop user might drag in three PDFs into a conversation — maybe 1,500 words each, which is just 4,500 words of text. It's not that long. And they say, "Summarize these." Claude processes the raw PDFs with all the formatting overhead that goes with that: the headers, the footers, the embedded fonts, the layout metadata, and the entire binary structure gets encoded as tokens. So the 4,500 words of content can become 100-plus thousand tokens if you're not careful.

    All you have to do to avoid that is just think in terms of markdown. If you just ask Claude — or frankly go to any of a number of free services on the internet — and say, "Please convert to markdown," it will just do it. It will take 10 seconds and convert to markdown. And then you have a very clean set of content that's between 4,000 and 6,000 tokens. That's like saving you 20x on the memory.

    And this waste just compounds. Because once those 100,000 tokens are in your conversation history, they bounce back and forth and back and forth. And this is how you fill up your token window, and you wonder how other people get so much done.

    Please, if you're new to AI or if you've never thought about it, think about the file formats you're throwing in, because so many of these file formats are designed to be human-readable. They're not designed to be AI-readable. Think about the token efficiency of these file formats. And if you're wondering how to convert to markdown — I built something for you. All you have to do is ingest a file, hit transform, and it converts it to markdown. That's it. We have a number of file types and we're adding more from the community all the time. It's part of the Open Brain ecosystem. It's just a plugin you can put in and it will convert it to markdown.

    But that's not the only way. You can tell Claude to do it directly. You can also just do it on the internet with any of a number of free web services. Markdown conversion should not be gated. It's super easy to do.

    Tokens are designed to preserve everything in an original text. If you wanted to reason about the style of the PDF, fine, keep it. But 99% of the time, all you care about is the text. You just want it in markdown. Please think about your file formats.

    Intermediate mistake: conversation sprawl

    Nate B Jones: The next big mistake that people make — and this one comes a little bit after people tend to convert to markdown and start to understand how some of these initial documents work — is conversation sprawl. Please do not sprawl your conversations.

    If you are doing 20, 30, 40 turns on a conversation, no AI was reinforcement-learned, trained, or designed to handle that kind of sprawl. All you're doing is compressing the ratio of the conversation where the original instructions happened. And yes, the models are getting better and better at anchoring on and remembering those original instructions even when they go through compression. But why make them suffer? Why make yourself suffer by filling up the context window with cruft? Why waste tokens?

    Why not just ask for what you want upfront? And if you're going to have an evolving exchange or evolving conversation, clearly mark it at the top: "Our goal here is to evolve and reach a conclusion together." Then you have a light conversation that goes 20 or 30 turns and say, "Thank you. I've got a conclusion. Please summarize this." And then you go and do real work.

    I see so many people trying to mix together modes. But AI is really designed for single-turn, do-a-lot-of-heavy-work interactions more and more. And in that context, you need to do the thinking in advance and bring that to the table. And if you need to think with AI, that should be in a separate chat, a separate conversation. It might even be a separate model. It might be three separate models, and you're bringing all of that in.

    I do that all the time. I'm like, okay, I want to look through what communities are thinking about AI on X — I'm going to go to Grok for that. Or I'm going to look at what earnings reports are saying about the state of AI and capital investment — I'm going to pipe that through ChatGPT thinking mode and get a bunch of reports out on that. Or I'm going to go through Perplexity research and get a bunch of reports out on that. Or I'm going to look at what some major blog posts have to say about a particular AI topic — I'll go to Claude Opus, do a targeted web search, go back through, and make sure I understand what I'm looking at.

    None of that is intended to be a single answer. These are all evolving conversations. Once I get what I want out of each of these individual threads, I can pull them together and say, "Okay, now I have a piece of work to do. Now I have something I actually need done, and I have all the context needed."

    So you should have two modes. You should have a mode where you are trying to gather information, and a mode where you are trying to focus and get work done. Do not mix the two together. That is how you burn tokens. That is how you confuse the AI.

    Your objective when you want the AI to do real work should be to be so clear that the AI needs to do nothing else — it just goes and gets the work done and comes back. It should be that clear.

    Advanced mistake: plugin overload

    Nate B Jones: If you are an intermediate user and you're like, "I know this stuff, Nate," well, let me give you another tip you may not know. The people who are adding lots of plugins to their ChatGPT or their Claude instances — you are paying a tax every time you start a conversation, because in the background those are going to be loaded in and they're going to start to fill the context window.

    I know someone who shared with me that they are over 50,000 tokens in on a context window before they type the first word, because they actually load that many plugins and connectors. You don't need that much. You know what that's like? That is like walking into a fully functional tool workshop, and the first thing you do instead of leaving the tools on the walls is you go and get all the tools off and lay them out on the workbench and say, "Okay, now we're going to make a bench." Do you need all 200 tools in the workshop to make the bench? No. You probably need the right five.

    Think about that the next time you have an approach to tooling. Because so many of us hear about a new plugin, hear about a new connector, someone hypes it up, we say we need to add it, and we don't realize it's a silent tax for the rest of time. Every time we have a conversation, it adds a little bit — it adds 1,000 tokens, it adds 2,000 tokens, whatever it does — and it just adds it always.

    Do you want to pay that for the model? Maybe you should think more strategically about which plugins and connectors are really adding value for you. Because they can be tremendously valuable, but make sure you know which ones you really want. Because if you don't, you're going to be looking at dozens of plugins that you don't really need, that are supposed to add value but just add a bunch of junk into your context window, confuse the model, and keep it from doing good work — and maybe confuse it as to which tools it's supposed to use.

    Expert-level mistakes: system prompt bloat and context management

    Nate B Jones: I'm saving the most expensive and the most advanced users for last, because this is where the leverage lies. If you are an advanced user — if you are someone who's like, "Send me to the GitHub repo, I can just do this myself, let me install Open Claude on my Mac Mini, I'm okay managing the gateway, I can be secure" — this is for you.

    You have the most leverage of anybody out there in terms of how many tokens you use. And typically speaking, your mistakes are the most expensive ones, because if you screw up, you're screwing up at a level of hundreds of thousands or millions of tokens, maybe more. The reason why is simple: you are doing bigger projects with AI. And when you do big projects with AI, your ability to leverage AI effectively becomes one of the most critical things you can do to manage ROI and cost on a particular project. It is a job skill at that level.

    If you're technical enough to go to a GitHub, you have a job skill to manage tokens efficiently. And you cannot pass that off to somebody else. That is not going to be somebody else's full-time job at an org. All of us are going to have to learn to manage our tokens well.

    If you are the person who is responsible for the system prompt on an agent and you haven't pruned it in the last couple of weeks, what are you doing? If you haven't sat there and gone line by line and said, "You know what, a hundred of these lines I don't need anymore because they've been here since 3.5 and I don't need them now" — if you're sitting there and you're like, "I don't know why we're loading this entire repo into the context window, we just do it all the time and it seemed to work two generations ago but we never tested it" — that's just irresponsible.

    You need to be in a position where you are actually allowing the gains in model intelligence to lean out your context window. If you want to look at the larger trend we see in AI today, it is that we needed to frontload and be really specific about a lot of context for dumber models in 2025. And now that it's 2026, as the models get more intelligent, we can lean out the context window initially because we can trust the model to retrieve better. So take that seriously. That is something you can do that is practical to get ready for Claude Mythos. Don't sleep on it.

    Again, if you're technical, these are million-token decisions we're talking about, especially if you're running this agent over and over again. It adds up.

    The real cost difference: a concrete example

    Nate B Jones: Let me give you a specific example that is based on the original beginner example with PDFs, to show you the tangible difference in cost. This should cascade all the way across. If you don't believe me, this is real.

    Let's say you feed raw PDFs into context — let's say it's 100,000 tokens versus 5,000 like we talked about. Let's say it's a conversation sprawl that takes 30 turns. I've seen these — this is very realistic. And let's say you use Opus for everything, including formatting, including proofreading, and you're making something over a five-hour session where you're talking back and forth. You might be spending roughly 800,000 to a million input tokens, with maybe 150,000 to 200,000 output tokens including thinking. At $5 in and $25 out per million, you're spending $8 to $10 worth of compute — which you might say, "I can tolerate that," or "I've got the unlimited plan," or "I don't care." Whatever. But I want you to look at the difference.

    Because anytime you start to get serious with AI, you need to see the difference. We talk about not being wasteful with artificial intelligence — this is being wasteful. You want to save water, you want to save energy? Don't waste your tokens.

    Clean session, same work: convert documents to markdown first, start fresh conversations every 10 to 15 turns, use Opus for reasoning and Sonnet for execution and Haiku for polish, scope the context to what's needed. Over the same period of time, you get the same result for 100,000 to 150,000 input tokens — a lot less — and maybe 50,000 to 80,000 output tokens. You blend that across both models, and instead of costing $8 to $10 in compute, you spend a dollar. You got the same amount of work done. In other words, you got an 8 to 10x reduction in cost.

    Now scale it. That sloppy user is burning $40 to $50 in compute a week, and the clean user is burning $5 to $7. Across a ten-person team on an API, that's $2,000 a month versus $250 a month for the exact same result. For subscription users, it's the difference between hitting your limit daily and forgetting that limits exist because you're just that productive.

    What Claude Mythos pricing could mean

    Nate B Jones: Now, if you think this isn't serious, I want you to think about the cost structure for Mythos for a minute. Mythos is rumored to be by far Anthropic's most expensive model. I think very strongly that by April or May we are going to have a new class of pricing well above the $5 to $25 range for tokens — into maybe 10x that. Imagine a world where you are at 10x what Opus costs now. $5 in, $25 out for Opus — what if it's $50 in, $250 out? Well, now things start to get serious. Now that 8 or 10x reduction on individual work for a day becomes something that you can actually measure and think about as a business. And imagine how big that gets when you start to work across a dev team.

    The mistakes you're making today were tolerable because models were priced cheaply. When cutting-edge intelligence that you want comes out more expensive, and I don't know the exact price — I'm not saying it's $50 and $250, I'm giving you a thought exercise, it might be $10 and $50 instead — it's still the same point. The point is the model that you want is going to cost more. And as models cost more, your mistakes scale. Your mistakes scale with the price of intelligence.

    And make no mistake, the models will keep getting better. Every quarter, every release, the trajectory is unambiguous. People who tell you the models are plateauing are lying. They are lying to you. The models are getting much faster. I do see occasionally that people are insisting the models aren't getting better. It's not true by any measure out there. And the people I see insisting on it, I think they're insisting on it partly because they don't want to face the world as it will exist when AI is this good and continuing to accelerate this fast. It's scary. But we should face it, and we can all work through it together.

    Introducing the "stupid button" diagnostic tool

    Nate B Jones: I have built a stupid button. That is my contribution to this discourse. I am building a stupid button so you can check and see if you are using your context incorrectly. I want to save you money. I want to save you hundreds of dollars. Please do not be stupid with your tokens.

    If you care about it, don't waste the water, don't waste the electricity. If you just care about the bottom line, also don't waste your bucks. We should probably care about all of those things.

    If you want to know what's in the stupid button, it's really simple. There are six questions that I'm helping you answer.

    Question one: Do you feed Claude raw PDFs and images when all you need is text? Is there something you are doing that is grossly inefficient as far as tokens go? By the way, screenshots are terribly inefficient. It would be much, much better to just copy and paste text. Convert to markdown always. Claude can do it really, really fast for you. Why not?

    Question two: When was the last time you started a fresh conversation? Are you one of those people that keeps a conversation going forever? I swear the number of people who keep their conversations going forever is highly correlated to the number of people who start experiencing symptoms of LLM psychosis. Why? Because models drift over time. They were never intended for that long a conversation. If you're having a long-running conversation, you're just in strange territory.

    When was the last time you started a fresh conversation? And why is that? Again, every time you take a turn in a conversation, you read it as sending one line back. But Claude or ChatGPT or Gemini reads it as sending the entire conversation back. And if you're wondering, is this something that's just for Claude? No, it's for ChatGPT, it's for Gemini, it's for Llama, it's for any LLM you're using. It's for Qwen. This is how LLMs work. Don't waste it.

    Question three: Are you using the most expensive model for everything? Are you using Opus? Are you using GPT-4.5 on pro mode? Whatever your choice is, are you picking the most expensive model and just blindly using it regardless, when the cheaper model may work better? This is especially important if you have production workloads, but it's also true for all of us. If you're doing something that's a simple formatting task, don't depend on Opus for it. Use the models for what they're designed for. Don't bring a Ferrari to the grocery store.

    Question four: Do you know what's loading in context before you even type? You can actually find this out. You can run `/context` in Claude Code. You can look at the number of things that are loading. If you're in Claude Code, or if you don't know what that means, you can go to your ChatGPT or your Claude and see how many connectors you have available, how many you've loaded up. You could be loading tens of thousands of tokens that you're not really aware of and not really using.

    If you enabled Google Drive months ago and you never ever use Google Drive — you just thought it was cool on the day it launched — why? Just drop it. There are so many examples like that where we see something cool, we add it, and we forget it's there. It's like a barnacle on a ship. It's going to slow you down. It's going to burn tokens. You don't need to have it. Audit your plugins. It matters.

    Question five: API builders — are you caching stable context so you don't reuse it? Prompt caching can give you a 90% discount on repeated content. Cache hits on Opus cost $0.50 per million versus $5 per million standard. It makes a difference. Do not sit there and ignore prompt caching. Take it seriously. If your system prompt, your tool definitions, your reference documents aren't cached, what are you doing? This is not advanced stuff in 2026. You should just be doing it.

    Question six: How are you handling web search? Are you letting Claude do web research the expensive way? People don't realize this, but if you call Perplexity for a search, it tends to be much more token-cheap than using Claude natively. Now, Claude is addressing this. There are lots of ways to do Claude search — you can actually use Claude to navigate through a browser, you can directly search in the terminal and it will spin up something in the background, and you can call something like an MCP connector for Perplexity. All different options you can use.

    This is broadly true — it's not just true for Claude, it's true for ChatGPT, it's true for Gemini, etc., because MCP is magic. But if you are trying to do search, the larger point is that you should be doing search as cheaply as possible. If you just want quick results that are token-efficient, it may be worth it to take the time to spin up an MCP and just have a dedicated service that returns the search results.

    What I have found experimentally with Perplexity and Claude is that Perplexity tends to burn something like 10,000 to 50,000 fewer tokens per search — which is not a small number if you're doing complex search — and it tends to be five times faster and it has structured citations. This is not meant to be a Perplexity plug. It's a token management plug. Try it for yourself. But I like faster, I like citations, I like fewer tokens. Over a research-heavy session, a plugin like that can save you a lot on the token side.

    And that's a larger call-out: if you have ways to look at your token usage and to diagnose it, you're going to be smarter about it. That's the whole point of the stupid button — let's not fly blind here. Let's look at our actual token usage and actually make some good choices and optimize it.

    What's inside the stupid button

    Nate B Jones: Now, what's in this stupid button? Number one, there is a prompt. If you've never done this, if you're like, "What is an MCP server?" — we've got a prompt for you. A prompt you can run against your recent conversations that actually identifies the specific things you specifically are doing wrong. It will see which documents you're feeding raw. It will see your conversation sprawl. It will look at model misuse. It will look at redundant context loading. It looks at your actual patterns and it will tell you what to fix first. So that's the easy version — anyone can use it, any plan, no setup required.

    Number two, a skill. This is an invocable skill that audits your Claude Code or your Desktop environment or any other environment — it could be ChatGPT, etc. Skills are also translatable, and it measures your per-session token overhead. It will flag system prompt load. It will check your plugin and skill loading. It will give you a before and after before you make changes. Think of it as a gas tank for your tokens — wouldn't it be nice to have one? So it's like the gas tank skill.

    Number three, we built some guardrails. Guardrails will sit directly on your knowledge store. So if you're an Open Brain person — which is something we've been doing as a community — it will sit right on your Open Brain, and you will stop burning tokens on input. Automatic markdown conversion for documents hitting the store. Index-first retrieval instead of dump-and-search. Context scoping that enables a sort of minimum viable context for the query. This is where token management stops being just a personal discipline and becomes infrastructure that starts to maintain itself.

    I'm really excited to see how the community continues to build on this, because Open Brain is open source and we'll keep evolving it and improving it. But I wanted to make sure we had rails that ensured responsible token usage for the Open Brain community.

    Five commandments for token-efficient agent design

    Nate B Jones: I'm going to close by talking briefly about agents and context, because agents burn hundreds of millions of tokens in some cases. We don't want to leave them out. How do we think about context management for agents? I'm going to give you five commandments. I call it the Keep It Simple Stupid commandments for agents.

    Commandment one: Index your references. If an agent is getting raw documents instead of relevant chunks, you've already failed. The entire point of retrieval is to scope what the model sees to what it needs. Dumping a full document set into the window on every agent call is wildly irresponsible. You can't do that just to give the agent context. Don't make the agent do work it doesn't need to do.

    Commandment two: Prepare your context for consumption. Pre-process, pre-summarize, pre-chunk it. A reference document should arrive in an agent's context ready to be used, not ready to be read or processed. If the model's first several thousand tokens of reasoning are just spent dealing with the crappy pre-processing you did, you're not being a responsible agent builder.

    Commandment three: Cache your stable context. System prompts, tool definitions, persona instructions, reference material — anything that is stable should be cached at a 90% discount on cache hits. This is the lowest-effort, highest-impact optimization you have on the table. If you're making thousands of agent calls a day and you're not caching, it's just pouring money down the drain.

    Commandment four: Scope every agent's context to the minimum it needs. A planning agent does not need your full codebase. Don't give it the full codebase. An editing agent doesn't need your project roadmap. Don't give it the project roadmap. Passing everything to every agent is architectural laziness, and it has real costs both in tokens burned and in degraded agent performance. Models perform worse when they're drowning in irrelevant context.

    And by the way, if you're like, "I'm not sure what the agent will need — aren't the smarter agents supposed to find it?" The answer is yes. But you will only do that efficiently if you give them a searchable repo that is pre-processed so they can go and get only the relevant slice of context. So take the time to do it right.

    Commandment five: Measure what you burn. If you don't know your per-call token cost, you're just optimizing without any information. Please instrument your agent calls. Track your input tokens. Track your output tokens. Track your overall model mix and your cost ratio. You cannot improve what you do not measure.

    Most teams building agentic systems are thinking a lot about whether they are semantically correct, not whether they're functionally correct — there's a big difference. And they're thinking a lot about optimizing their system prompt. They're not thinking a ton about their model cost, because most of the time the model cost is not what makes the project live or die. I get that in 2025 and early 2026, with the costs we have today and the urgency from executives to build, the $12 per run cost or whatever it's going to be is not going to make or break the ship. But plan for a world where the models are more expensive. Plan for a world where you have to scale up. Plan for a world where you have to be responsible and instrument.

    The cultural problem: burning tokens as a badge of honor

    Nate B Jones: Stepping back, there's a cultural problem we need to acknowledge behind all of this. At some point in the last few months, burning tokens has become a badge of honor. And I get it. There is a degree to which you need to be burning tokens in order to do meaningful work in the age of AI. None of this is to say that I expect token consumption to go down — it won't. You need to be ready to burn those tokens. This is not an ask that you not do that. This is an ask that you do it efficiently.

    And so when Jensen sits there on stage and says $250,000 in token costs per developer and everyone is shocked or rolls their eyes or whatever the reaction is — my reaction is: I hope it's $250,000 in smart token costs. It's not the individual dollar amount for Jensen, because he's got cash in the bank. It's whether the tokens were used well. It's whether it's smart tokens.

    So begin to think to yourself: yes, I need to be maxing out my Claude. There are people who go into withdrawal when they don't get to use their Claude. I know people like that who are like, "Ah, I went to a movie and I couldn't use my Claude for a few hours. I feel like I missed out on my token limit." Touch some grass. It's going to be okay.

    But use your tokens well. Be efficient with your token usage. Know what you're spending it on. Don't spend it on silly stuff. Don't spend it on PDFs that you have to convert. Actually spend it on meaningful work. And that is a human problem. We need to be bold and audacious. These models are really good at stuff. So let's get more bold, more audacious, and think bigger about what we can aim them at. Because if we can be more efficient, we can do a whole lot more cool and creative stuff with those tokens. That's why I built the stupid button.


    Polished transcript of AI News & Strategy Daily | Nate B Jones. All views are those of the original speakers. Watch on YouTube ↗
    Published by @maverick
    More from AI News & Strategy Daily | Nate B Jones
    More from @maverick
    Summary