Nate B Jones compares prompting strategies for ChatGPT 5.1 and Gemini 3
A solo breakdown of how to prompt ChatGPT 5.1 and Gemini 3 differently based on the type of input each model handles best.
Summary
Nate B Jones of AI News & Strategy Daily presents a tactical prompting guide comparing ChatGPT 5.1 and Gemini 3, both released within days of each other. His central argument is that the meaningful difference between the two models is not brand or benchmark performance, but the type of "entropy" each handles best: Gemini 3 excels at ingesting large, messy, multimodal inputs and imposing structure on them, while ChatGPT 5.1 excels at executing complex, multi-step tasks when given clean, well-organized inputs. He walks through a keep/stop/start prompting framework for each model, offering specific tactical advice on context placement, verbosity control, modality naming, and mode selection. His conclusion is that the two models are complementary tools rather than competitors, and that choosing between them should depend on whether the problem is one of chaotic input or complex reasoning.
Key Takeaways
FULL TRANSCRIPT
Introduction: Why the type of input matters more than the model
Nate B Jones: Most people talk about models, but very few people talk about the kind of mess you hand the model. This video is all about the differences in prompting between ChatGPT 5.1, which came out a week or so ago, and Gemini 3, which came out a couple of days ago. I'm going to get into the specifics. I'm going to explain how you prompt them differently, why it matters, and how your attention changes as a result. We're going to get very specific and tactical, because I think that is going to be a huge driver for you to be productive with, frankly, both of these models. The goal here is not to have you pick a model — it is to have you use the right tool for the right job.
So if I were to give you a summary of each of these after playing with them for the last few days: Gemini 3 is built to eat messy, high-entropy context — logs, PDFs, screenshots, video — and turn it into some kind of structure. ChatGPT 5.1 is built to take clean, relatively low-entropy, relatively organized inputs and do complex multi-step tasks with them: reasoning, coding, planning, narrative development.
This implies real shifts in your prompting habits, and you're better off asking "which model do I pick for which job?" than just assuming you can go with one or the other. So let's start and ground-set on how to think about ChatGPT 5.1, and then we'll get into Gemini 3 and the comparison.
ChatGPT 5.1: The baseline mental model
Your baseline mental model for ChatGPT 5.1 is that you should treat it as your operator, business writer, and coder. It loves clear roles. It loves audiences. It loves specifics on tone. Remember, they tuned this model to follow instructions, and they partly did that specifically to address complaints around ChatGPT 4 on writing.
ChatGPT 5.1 performs best with curated, relevant context — not just giant raw dumps. From a mode perspective, you get benefits both with speed and with depth, and you have to use them intelligently and intentionally. If you are doing a speed run with ChatGPT 5.1, you want to be thinking: what instruction set do I give the model that I don't want it to chew on and spend time thinking about — I just want it to follow exactly what I say and do it — versus an instruction set where you want those reasoning tokens. Everything else about ChatGPT 5.1 sits on top of that understanding of the model.
ChatGPT 5.1: Keep, Stop, Start prompting framework
If we transition into prompting for 5.1 and think about it in classical engineering terms — keep, stop, start — what do we keep from a prompting perspective that we may have used already? What do we stop doing? What do we start doing?
Keep: You want to keep defining role, audience, and tone. We've heard that advice for a long time. This is still a high-leverage pattern in 5.1. You want to continue to be explicit about the structure of output — ask about sections, headings, bullet count, JSON schemas. 5.1 is built to follow those structural instructions very reliably. You also want to keep using modes intentionally. If it's light edits or quick answers, you're going to go to instant. If it's hard reasoning or refactors, you're going to use thinking. Essentially, you want to keep letting it drive the narrative and use the context you give it to solve difficult tasks. This model likes to eat problems — executive memos, product narratives, internal explainer docs. These are things it's going to do really well at.
Stop: You want to stop dumping huge unfiltered context windows into 5.1. I don't find that to be super relevant — I think you pay more and you tend to dilute the value of the model. You want to stop hiding the task inside a wall of background. I see that in a lot of so-called big prompts. You don't necessarily need that page of company lore from the wiki page. Just ask specifically for what you want, and this aligns with ChatGPT 5.1's own documentation as well. You also want to stop packing four or five jobs into one prompt. Give the model the specific ask you're looking for, and then you can chain it into additional steps if you need to. Idea generation is a different model task from critique, which is different from selection. Those naturally feel like they should be broken apart with 5.1 because of the way the model prefers clean inputs.
I would also call out that now that we have a model that is willing to follow instructions on writing style, use it. Start asking for different kinds of instructions. How do you ask for instruction on tone that matches marketing versus instruction that matches the boardroom versus instruction that matches the engineering team? 5.1 is still not quite as good as Claude at style, but it is much better at following instructions than it was.
Start: You should start treating 5.1 almost like an internal function library. This is a little bit of engineering talk, but you want to be able to define reusable patterns and call back to them with stable formats as much as you can. So: here is my stable pattern for drafting an internal memo to the team. I'm defining it explicitly. I'm asking ChatGPT 5.1 to remember it, or if I use it a ton, I'm putting it into project instructions or system instructions, and then I'm going to go back and invoke it deliberately by saying "draft an internal memo." You want to reuse those table formats.
You want to start giving step plans when you want deliberate thinking. So: first ask three clarifying questions, then propose three options, then choose one, then write the doc. That backfills the model into thinking carefully about the task. You also want to start being explicit about tools. Tell the model what tools are important. Give it those constraints. Is web search important? Say so. You also want to start constraining verbosity and register. If you say "this has got to be within five to seven bullets and it's for a VP audience," that is super helpful to the model because it constrains the register of language and helps the model know what to actually put out.
Gemini 3: The baseline mental model
That is 5.1. When you go to Gemini 3, it is a different world, because Gemini 3 is built for different kinds of tasks. So if we run the same keep, stop, start with Gemini 3, you get some overlap. I'm not going to pretend these are entirely different beasts, because they are all large language models — but you get some important differences that I want to call out.
Gemini 3: Keep, Stop, Start prompting framework
Keep: With Gemini 3, you want to continue to be precise and unambiguous. Gemini 3 responds best to clear goals and output formats — that's somewhat similar to 5.1. Having some structure on the output is also still helpful: JSON, table, standardized tags, whatever you use for structured output. I am not going to be the person who tells you JSON is magic, because it's not. Just have clear structure. You also want to keep using step-wise reasoning when tasks are complicated. If you're saying step one, step two, step three — as I talked about with 5.1 — that's still useful.
Stop: Number one, and this is so important: please stop treating Gemini 3 like it is ChatGPT from Google. It has different characteristics. Its real edge, as I called out, is being multimodal — ingesting video, images, text, and very lengthy context as first-class objects. If you only ever send it very short text prompts, you're not really using it for what it's differentiated for.
The other thing you need to stop doing is putting all your detailed instructions at the top when you use huge context. If you are using that million-token context window — and Google's docs say this very specifically — you want to put the context first and the instructions at the end. So with long docs, codebases, and videos, the better pattern is to put all of that at the top and then put the instructions at the bottom and say "anchored to the information above" or "based on the information above, do XYZ." That is your instruction placement.
Stop assuming Gemini 3 will be verbose or chatty by default. This is very different from ChatGPT 5.1. Gemini 3 is tuned to be concise. If you want a longer or more narrative answer, you are going to need to say so. I have wrestled with this already with the model. It covers everything, but it loves to be concise.
Stop referring to your multimodal inputs vaguely. Saying "screenshot above" is weak. Instead, say: "Use Image 1, funnel dashboard, for XYZ. Use Image 2, checkout screen, for ABC. Compare them by doing one, two, three." You want to be as specific as you can, because you have to assume the model needs that context to know what, within this lengthy context you're giving it, you're actually referring to. If it's sorting through multiple videos and screenshots and images, help it find what you're talking about in the instructions.
Start: Start using Gemini 3 as your entropy eater. Give it those giant messy bundles — the logs, the PDFs, the transcripts. Ask it to output structured, grounded artifacts for you: issues lists, timelines, hypotheses, tables.
You also want to start anchoring your long-context prompts really explicitly. One example of a pattern would be having role and global constraints, then big context blocks in the middle of the prompt for most of the prompt, and at the very end: "Based on the information above, do X in Y schema." This helps you anchor those long-context prompts so the model knows what to do once it's read the context.
You also want to start specifying verbosity and persona every time you prompt with Gemini 3. "Use a conversational tone. I need 800 to 1,000 words here. Return a four-bullet list." Whatever it is — don't assume the concise response from Gemini 3 is going to be fine. Decide what you want.
You should also start naming and indexing every single modality. That sounds complicated, but it doesn't have to be. "Image 1" — that's naming a modality. "Video 2, from minute 1:30 to 2:00." "CSV, columns 1 through 4." Tell it what you want it to use when you're giving the task, because you have to assume it needs to search the pile, and you get better retrieval if you're more precise.
Also start using those reasoning controls on purpose when available. Raise the thinking level only when you truly need cross-document synthesis. Keep it low if you're just doing labeling. If you're doing pure retrieval and extraction, tune that on purpose — just like ChatGPT 5.1, be deliberate about when you use thinking.
The deep difference: Context entropy vs. task entropy
So if we step back — I've done a deep comparison here of 5.1 versus Gemini 3. What do we see overall?
The deep difference is not just Google versus OpenAI. It is what kind of entropy each model is best at handling. You are looking at a world of context entropy versus task entropy.
Context entropy is how messy, large, and multimodal your inputs can be. They could have lots of irrelevant details. They can have mixed formats. They can have timelines, screenshots, logs, videos. Sound like Gemini 3? It is.
Task entropy is how open-ended and multi-step the job is — vague objectives, competing constraints, multiple stakeholders, tool calls, planning and writing and coding. ChatGPT 5.1 is a little bit better there.
You get the best results when you align the model to the entropy it's dealing with. Gemini 3 does well with high context entropy. I think it does okay with task entropy, but I would grade it about moderate. "Here's everything — find the signal and structure it" is a great use for Gemini 3. ChatGPT 5.1 is very low to moderate on context entropy — you have to give it really clean signal — but then you can give it a complex task.
I want to be precise about this, because ChatGPT 5.1's docs do call out that if your context window is competing — if you're giving it instructions that are ambiguous and try to cancel each other out, like "be descriptive and concise" — 5.1 doesn't like that. It will burn tokens trying to fix that ambiguity. I have seen it push back on me when it feels my prompts are inaccurate, which I love. Assuming you have clean prompts, though, I do think it can handle a very high-complexity task and think it through. It is sort of like a brain in a jar in that regard. If you can give it a really clean input, it can process it, and it can be quite a complex task, and it will come back really thoughtfully.
I want to emphasize here that the differences I'm talking about are differences on top of very capable baseline LLM capacity. These models are all good at lots and lots of everyday tasks. They're good at writing emails. They're good at synthesizing PRDs. They're good at writing engineering requirements. The things I'm calling out are the nuances that help you make the most of these models.
Prompting effort: Where you spend your attention with each model
So, prompting shifts in line with this insight around entropy. With Gemini 3 prompts, you are actually spending your effort on output structure, on task constraints, on how you anchor phrases and name and define which part of the context you're retrieving. You need to get comfortable feeding high-entropy multimodal context — which I normally shy away from — but you have to define what good synthesis and good analysis looks like across that context: schemas, ranking criteria, what you retrieve, and so on.
Whereas with ChatGPT 5.1, you're spending more of your time on task definition. Is it really clean? Is it unambiguous? You're making sure you insist on the tone you want. And you may pre-process your inputs so that well-structured context is available to the model so that it can think deeply and not wade through junk.
Conclusion: Use each model for what it does best
If you want all of this in one line: use Gemini 3 to tame the chaos of your inputs, and use ChatGPT 5.1 when you're tackling hard thinking and communication around more structured inputs. Once that chaos is structured, you can do some of both with both models. But that is the takeaway I am starting to come to.
I think both are very strong. I think with Gemini 3, we are still at the beginning of exploring those capabilities. I know it has capabilities on the coding side that I didn't discuss a lot in this video — I'll probably do a separate one on how it codes. I think a lot of the power you see in some of these general-purpose exams and tests around visualization, around understanding how code is structured, and around building things usefully in one go, comes down to the ability to deeply understand multimodal inputs and write clear, coherent responses to those inputs. That's why I focused a lot of this prompting guide for Gemini 3 there.
There will be other insights we come to in the future, but I wanted this initial guide to focus on where I see stable differences between the models so that we can build our understanding from there. Both models are great — it's about understanding the nuances, and that's why I'm giving you this prompting master class on ChatGPT 5.1 versus Gemini 3.