podProse

Podcast transcripts, polished for reading

podProse

90% of AI Users Are Getting Mediocre Output. Don't Be One of Them (Stop Prompting, Do THIS Instead) | AI News & Strategy Daily | Nate B Jones Transcript

Nate B Jones explains how to escape AI's "averaging" effect using four customization levers

A solo presentation on why default AI outputs feel generic and how to fix it using memory, instructions, tools, and style controls.

Summary

Nate B Jones of AI News & Strategy Daily argues that the reason most people find AI output mediocre is not a flaw in the models themselves but a structural consequence of how they are trained. Through a process called reinforcement learning from human feedback (RLHF), AI models are optimized to satisfy the broadest possible pool of human raters — not any individual user — producing what Jones calls "median output." He identifies four distinct levers beyond prompting that users can deploy to escape this averaging effect: memory, custom instructions, apps and tools, and style and tone controls. Jones walks through how each of these levers works across ChatGPT, Claude, and Gemini, and argues that most users are either ignoring these levers entirely or using only one. He closes with a practical framework for encoding personal corrections back into AI settings over time, citing developer Boris Cherny's discipline of updating a Claude markdown file every time Claude makes a mistake as a model for how compounding personalization works in practice.

Key Takeaways

AI is trained to satisfy raters, not you. Through RLHF, models learn to produce outputs that a pool of human raters would score highly — not outputs tailored to any specific user's context, constraints, or preferences. This is a known, documented mechanism, not speculation.

Default output is median output. Every time you use AI with default settings, you receive an answer optimized for a hypothetical typical person. The farther your needs are from the average, the more default settings will fail you.

Memory works differently across platforms. ChatGPT uses layered memory including saved facts and conversation history, with project-scoped isolation available. Claude defaults to project-scoped memory with periodic summary updates and supports memory import/export. Gemini connects to your Google ecosystem (Gmail, Photos, YouTube) for personalization, with a privacy trade-off.

Custom instructions are severely underused — and vagueness is the main failure mode. Generic instructions like "be concise" or "be direct" do not meaningfully steer models. Specific behavioral instructions — such as "when I'm stuck on a problem, ask me diagnostic questions rather than immediately giving solutions" — produce dramatically better results.

Model Context Protocol (MCP) is the underlying standard for tool connectivity. With over 10,000 MCP servers available, users can connect AI to external tools and data sources. ChatGPT calls these "apps"; Claude supports a wide range of MCP servers with varying reliability. Gemini lags behind on tool use, which Jones identifies as a significant weakness of that ecosystem.

Style controls let you shape how AI communicates. ChatGPT offers eight personality presets with granular adjustments for warmth, enthusiasm, headers, and emojis. Claude offers three built-in presets plus a custom style feature that can generate a style profile from uploaded writing samples — which Jones describes as more powerful than trying to describe your style in words.

Corrections are steering inputs, not frustrations. Users getting the most from AI treat every unsatisfying output as information. They capture recurring corrections, encode them into instructions or memory, and update style settings — compounding personalization over time rather than starting from scratch each session.

Steering has real limits. Personalization does not fix hallucinations, which are a separate problem. In creative work, AI still pulls toward the center of its training distribution, and steering against that tendency requires ongoing effort.

The investment is only worth it for regular users. For occasional AI users, setting up and maintaining these levers may not be worth the time. For users who rely on AI multiple times a week for similar tasks, the compounding return on a few hours of setup is significant.

FULL TRANSCRIPT

Why Default AI Output Feels Generic

Nate B Jones: Nobody gets 10x results from default vanilla ChatGPT, vanilla Claude, vanilla Gemini. It just isn't how it works. But most of us have slept on the big levers that these model makers have released to help us customize these models and get the most out of them. This video is all about those levers — what you missed and how you can customize your AI to get the most from it. It's about how AI averages you out, and how you can stop it.

Lever number one is memory. ChatGPT has a way they handle memory. Claude has a way they handle memory. We'll get into it. Instructions is lever number two. Again, ChatGPT, Claude, Gemini — they all have their versions of instructions. We're going to hop into that. Style controls: ChatGPT has eight different personalities, Claude has different style summaries. We're going to get into that. And then apps and tools — what apps and tools do these models support?

Together, these are big changes, but most of the time we hear about them in clickbait articles and we're told to do one thing specifically. I don't want to look at doing one thing specifically, because that doesn't change the averaging function. You are being averaged out into a median AI user, and I'm interested in you understanding the levers so you can customize your AI into something that truly allows you to be transformative.

What "Being Averaged" Actually Means

So what does being averaged even mean here? The simplest way to understand it is to imagine a restaurant that wants to create one dish to satisfy the widest possible range of customers. Let's say it's pizza. They don't want to delight anyone in particular. They just want to avoid disappointing too many people. Papa John's, Pizza Hut — you get the idea. The chef studies what most diners order. They analyze which flavors get consistent approval across different demographics and they optimize for the middle. What do you get? It's edible. It's competent. It's technically fine. You can make the cheese look nice in an ad. But it's not your preference. It's not spicy enough if you like heat. It's not subtle enough if you like delicate flavors. It's not adventurous enough — or it's too adventurous if you're feeling mild.

This is exactly what AI does with answers. It's the Pizza Hut approach. It's not trying to give you the best response for your situation. It's trying to give the best response for everybody who might ask a similar question. It's the statistical middle. It's the median. When you ask for restaurant recommendations, you're going to get restaurants that would satisfy the most people who ask for restaurant recommendations. When you ask for career advice, you're getting advice that applies to the broadest set of people in roughly your situation. When you ask for code, you get code that follows the conventions most developers would expect.

This is why your output always feels just a little bit off. It's not wrong — you can't necessarily point to an error. It's just not yours. The recommendations hit tourist spots instead of the places you'd actually like. The advice applies generally, but not to your specific constraints. And most people experience this and think, "Well, the AI is just okay." They don't realize there's a mechanical reason, and they don't realize it's fixable.

How Models Learn to Be Average: RLHF Explained

So how do models learn to be average in the first place? This is not speculation — we know this. Modern AI assistants go through something called reinforcement learning from human feedback. Here's how it works. The model generates multiple responses to the exact same prompt. Human raters compare them and pick which one they prefer. The model learns to produce outputs that the raters would choose.

You catch the keyword there — raters, not you. A pool of people who rate outputs and judge which seems better. The raters are not experts in your field. They're not familiar with your constraints. They don't know your preferences about where you want to go in Paris when you travel there. They're looking at two responses and picking whichever one seems most helpful, most clear, and most appropriate. Hint: it's probably the one with the Eiffel Tower.

The model's optimization target is thus not "give the specific user what they need" — give Nate what he needs. It's "produce something a typical human would rate pretty highly." So when thousands of raters evaluate millions of outputs, the model learns to hit the middle of the preference distribution. It learns the answers that would satisfy most people. It learns the median.

This is not a secret. Anthropic publishes papers describing this. So does OpenAI. Nobody's hiding it. And there's an irony here, because the training process that makes these models so helpful in general is exactly what makes them mediocre for you and me specifically. The same mechanism that prevents the AI from being weird or offensive or unhelpful also prevents it from being calibrated to your particular needs.

The implication is significant. Every time you use default settings, you're getting an answer optimized for a hypothetical typical person. The training literally encodes "what would most people want here" as the target. And you're not most people — you're you.

For the last couple of years, prompting was the only way to escape the average. You would frontload your context into your question, specify your constraints and your preferences, and steer the model to adjust. And every conversation had to start from scratch. That has now changed. There are now at least four distinct ways to steer AI away from the median — four levers beyond the prompt itself. Most people are using none of them, or only one. Here's what you need to know about each one.

Lever One: Memory

Memory is the AI retaining information about you across conversations. So instead of starting fresh every time, it remembers your context, your job, your projects, your preferences, and so on. The promise is very powerful — the AI knows you and builds on that. The reality is platform-specific.

ChatGPT's memory works in multiple layers. There are saved memories, which are facts you explicitly ask it to remember, and then there's something broader — a sense of chat history where ChatGPT references your entire conversation history to understand your preferences. This can be very general. When ChatGPT pulls from past conversations, it now pulls clickable citations that let you know exactly which chat it's pulling from. While this makes the system transparent, I can tell in context that it's still not a very good memory implementation — it misses stuff I would consider obvious.

ChatGPT also has project-only memory. When you create a project, you can isolate the memory from general ChatGPT use, and what you discuss in that project stays in that project. One recent change worth noting is that temporary chats now retain your memory, style, and personalization settings. They used to be very stripped down, and now they're less so.

So what's your key tactic with ChatGPT? Tell ChatGPT to remember specific preferences that you care about. "Remember that I prefer one-sentence answers to factual questions" is a great example. "Remember that my audience always has people who think they can build their own local models" — that's another example. The automatic system captures a lot, but intentional memory is very reliable if you're starting to cultivate it with that mindset.

How does Claude work differently? It has two components. Claude can search past conversations — sort of like a RAG-style retrieval — and it can also generate a memory summary that synthesizes key facts across your chat history. That summary will update periodically. The distinguishing feature is that Claude's memory is project-scoped by default. Every project has a very separate memory space, and your startup discussions don't bleed into your vacation planning. The isolation is very intentional. Claude keeps contexts very focused because it needs clean context to work. This is reflected in the way they build their agents.

Claude also supports memory import and export. You can bring in memories from ChatGPT or push them out to Claude memory in another account. The interoperability is limited — there's not a one-click import — but technically the capability is there.

My recommendation with Claude is to use your projects very deliberately. If you're working on something with a very distinct context — like a client engagement — just create a project for it. The project gets its own memory, its own instructions, and it works really well.

Gemini has personal intelligence that connects to your Google apps — Gmail, Photos, YouTube, and so on. The pitch is that you can ask about tire options for your car and Gemini finds your car model from a Gmail receipt and gets the tire sizes right. The personalization settings let you connect or disconnect specific Google apps, so you can tune how much personalization it has.

The key tactic with the Google ecosystem is just to decide how much data you're willing to give Google. If you want to connect them all, you get immediate personalization. The trade-off is a larger privacy surface area, and you're going to have to make that call yourself.

Lever Two: Instructions

Instructions are persistent context about who you are and how you want your AI to behave. Severely underused by most people.

ChatGPT has several instruction layers. It has custom instructions — multiple text fields where you can describe what it should know about you and how you would like ChatGPT to respond. It has project-specific workspaces that come with their own instructions. And it even has custom GPTs.

The key tactic here is that your biggest leverage is in being specific. "Be concise" is not super effective at steering the model. Instead, say: "For factual questions, please answer in a sentence. For analysis requests, I need you to walk through the reasoning step by step." When you are clear about what you're looking for, you are helping the model understand under what circumstances you want that behavioral response.

Claude splits instructions across multiple places — profile preferences, project instructions, and styles. The key tactic with Claude is that Claude's style feature is really underused. If you have a distinctive writing voice and you upload samples of your best work, Claude can generate a style profile from them. Every response, Claude will then be thinking about how to match your tone, your sentence structure, and so on. This is much more powerful than trying to describe your style in words. And even if Claude doesn't get all the way there, it gets you most of the way there on first drafts.

Claude markdown files deserve their own note. For developers using Claude Code, the instruction layer that actually matters is a `claude.md` file. Boris Cherny, who created Claude Code, described his team's practice: whenever Claude does something wrong, they add a rule to `claude.md` so it doesn't happen again. The file is checked into Git. The whole team contributes. Essentially, the file contains project architecture, coding standards, and common commands that everybody on the team can see and update at any time.

Treat this as a living document. Every time Claude does something you don't want, just add a note. The first version is going to feel sparse, but within a month it's going to be all filled out.

Lever Three: Apps and Tools

Tools are capabilities the AI can use — searching the web, running code, creating files, reading documents, and so on. If web search is enabled, the AI looks things up. If it's disabled, it works from training knowledge. Most people have default enablement and don't think about it. And that's the issue.

There are a lot of different ways to configure your apps and tools that will profoundly shape your experience. We should start with Model Context Protocol, because that underlies so much of the rest of this.

The MCP standard explains how most AI systems today connect to external tools. Think of it as USB-C for AI — a universal interface that lets any AI connect to any tool through the exact same protocol. Anthropic created it, but everyone has jumped on board. There are over 10,000 MCP servers out there, with lots more on the way.

So how do people use these connectors? ChatGPT calls them apps, and you can connect to Gmail, Calendar, and so on. Once connected, ChatGPT will automatically reference them where relevant. What I've found in practice is that "where relevant" is very ambiguous. You don't have to select them manually, but you may have to remind ChatGPT that it has the capability. It also doesn't have a super deep search capability.

On Claude, you have a much wider range of MCP servers, but the connectivity isn't always reliable. It is, for example, quite tricky to connect to Stripe, but very easy if you want to connect to Figma. And that changes all the time as people mature those MCP server implementations. So with Claude, you have to think intentionally about what your tool sets are, and then look regularly to see whether there are MCP connectors you can use. Asana was just added, for example.

Gemini is shorter on tools than it should be, and it's one of the big weaknesses of the Gemini ecosystem. While personal intelligence will connect to apps, Gemini itself is not big on tool use, and that is one of the reasons so many builders prefer ChatGPT or — increasingly — Claude.

So think about this lever carefully. Your tools are really steering the inputs — they're not just features you add. If you want the AI to work with your real files, think about where they live and connect them. If you want verified code, think about how you enable code execution. Turning tools on and off changes the character of responses. A model may lean more on web search than you want if you enable internet access. These tools are not always good or bad — it's about being intentional about what you want.

Lever Four: Style and Tone Controls

Style controls let you adjust how AI communicates. ChatGPT has eight different personalities ranging from friendly to candid to nerdy all the way to cynical. On top of presets, they also have granular characteristics around warmth, enthusiasm, headers, and emojis — because apparently people complain about emojis. You can pick a personality and then dial it the way you want.

Your key tactic there is to describe the default personality and then be very clear in your instructions and in your settings so that there is no conflict. If there is ambiguity or conflict between your instructions — if you say "be verbose" in your instructions and "concise" in your personality — you're just going to burn tokens and make ChatGPT sweat. So don't do that. Think about what you really want.

Meanwhile, Claude offers three built-in presets: formal, concise, and explanatory. The custom style feature is quite sophisticated and allows you to upload samples of your writing, which is what I've talked about. But fundamentally, if you don't want to create a custom style, you should be picking a style that reflects how you actually behave. If you are actually a very casual Claude user, don't select formal. Go with something like explanatory, where you can have longer conversations. Think about your actual usage, not your aspirational usage.

The Most Common Failure Mode: Being Too Vague

Across all four levers, I've observed a really common failure mode — being too vague to really steer the model. "Be concise" doesn't move you. "Be direct" doesn't move you. The instructions that work need to be specific enough to change the shape of the output.

Compare the difference between "be more helpful" and "when I'm stuck on a problem, please ask me diagnostic questions rather than immediately giving solutions — I learn better by being guided than by being told." That is so much better. You're going to get so much better responses.

Compare "I'm a professional" — terrible — with "I've been doing product for 15 years. Please skip fundamentals and go straight to nuance." The specific versions tell the AI where you are and help it to help you, so it's not just delivering that averaged-out median answer that always feels off.

Compounding Personalization Over Time

This is where we start to separate people who get real value from AI and people who find it perpetually mediocre — because every interaction is generating information about what you need. And if you set your levers correctly, it starts to compound. Every time you think "that's not quite right," think of it as discovering a steering input, not just something to get frustrated about. Most people will correct in their head, get frustrated with AI, and move on. The people getting 10x results do something different. They capture the corrections, and when they notice a pattern, they encode it back into the AI — they add it to their instructions, they tell memory to retain it, they update their style settings.

Boris Cherny runs five Claude instances in parallel and another five to ten on Claude.ai, and ships roughly 100 PRs a week. His workflow is not magic. It's just the discipline to look at every mistake Claude makes and update a rule in `claude.md`. You don't need to be an engineer to do this. You can keep a notes file. You can find yourself making the same correction twice and write it down. You can review your instructions manually every month. This is actually not that hard, and the gap will widen over time as you start to invest in getting the levers right.

What Steering Can't Fix

I want to be honest here. Steering fixes the personalization problem. It does not fix everything. When the model hallucinates, that's not an averaging problem — no amount of personal context fixes that. There's also a ceiling in creative work. When AI generates prose or images, its training data pulls toward the center of the distribution. You can steer against this, but you're still fighting gravity. Steering always takes effort. You're figuring out your position, encoding it, maintaining it — it's costing you time. And if you use AI only occasionally, to be honest, it's probably not worth it.

But if you use your AI multiple times a week for similar types of work, the math changes. A few hours of investment every now and then buys you permanently better output, and the compounding effect in saving you time is real. It gets better the more you use it. Know which kind of user you are.

Where to Start

If this feels like a lot, you can start very simple. Pick one task where you use your AI regularly and the output doesn't feel right. Over the next few sessions, notice the adjustments you're making and write them down. That's it. Then go in and find the custom instruction setting for your preferred AI, stick those in, notice the difference, and iterate. That's as simple as it gets.

The median isn't mandatory. The AI you're using is trained on everybody else's feedback. It learned to please everybody a little, which means it learned to please no one in particular. Default output really is median output — optimized for very typical users with typical needs. And you are not typical. Your constraints are specific to you. Your goals are specific to you. And the farther you are from the average, the more default settings will fail you.

You can go beyond prompting. You can use memory, instructions, tools, and style. Prompting is still useful — I didn't talk about it in this video, but it's still helpful for steering in conversation. Most people are going to ignore these levers or do only one. If you are starting to get averaged and you're tired of it, you can adjust more than one lever and very quickly start to compound toward a more personalized AI that actually fits you. The choice is yours — you can stay at the median, or you can steer the ship and get the AI you want.

Polished transcript of AI News & Strategy Daily | Nate B Jones. All views are those of the original speakers. Watch on YouTube ↗

Published by @maverick

More from AI News & Strategy Daily | Nate B Jones

Microsoft CoPilot Decoded: 12 Flavors, 20x ROI Playbook3 Jul 2025

Deep Dive on OpenAI Data Connectors5 Jun 2025

The A-to-Z AI Literacy Guide (2025 Edition)9 Jul 2025

The 6 Proven AI Workflows That Survive Every AI Hype Cycle28 Jul 2025

I Was Wrong About AI Agents — This $200 Browser Actually Works11 Jul 2025

More from @maverick

BITCOIN: GOING LOWER!!! (accumulation zone, Q4 valhalla)5 Jun 2026

BITCOIN: COLLAPSING SO FAST!!!! (buy zone hit)4 Jun 2026

BITCOIN: IT IS REPEATING!!!!! (My strategy 2026)3 Jun 2026

BITCOIN: ANOTHER LEG DOWN STARTING!!! (how I profit from the bear)1 Jun 2026

The Science & Process of Healing from Grief | Huberman Lab Essentials28 May 2026

Summary