OpenAI's new Data Connectors feature reviewed and tested by Nate B Jones
Nate B Jones of AI News & Strategy Daily reviews and tests OpenAI's newly released Data Connectors feature.
Summary
Nate B Jones reviews OpenAI's newly released Data Connectors, which allow users to connect ChatGPT's deep research feature to personal data sources including Gmail, Google Calendar, Google Drive, GitHub, Linear, SharePoint, Outlook, and Zapier. Testing the feature across several real-world queries, Jones finds it largely falls short of its promise — particularly for broad, analytical tasks — due to a data pipeline that appears to cap API results at around 15 items per source. He identifies one narrow use case where it performs well: tightly scoped, keyword-defined queries such as researching a specific upcoming event. Jones frames the broader move as part of an ongoing competitive race between OpenAI and Anthropic for enterprise data and training data access, and draws a parallel to OpenAI's Operator feature, which launched poorly in January but improved significantly after six months of iteration.
Key Takeaways
FULL TRANSCRIPT
OpenAI Data Connectors: What They Are and How They Work
Nate B Jones: OpenAI released Data Connectors yesterday. Data Connectors are basically OpenAI's answer to Claude connecting you to Gmail and Claude connecting you to Calendar — being OpenAI, they added a bunch more that Claude had not previously added, because of course this is a competitive arms race. They added GitHub, they added Linear, they added Zapier, they added a bunch of things. And then they also had Gmail, they had Outlook, they had SharePoint, they had Google Calendar. Essentially they are saying that you can now search — in your Plus, Teams, and Pro account — across a lot of the personal information that you create as you do work.
To their credit, they are careful to call out that this is not a perfect search mechanism. They specifically note this shouldn't be used if you're searching, for example, in Google Drive for doing extended work across math on Sheets. So if you're doing a spreadsheet analysis, deep research is probably not best positioned for that. I'm sure they're right about that.
Testing the Feature: Where It Falls Down
But even when I gave it several queries — deep research queries that were not designed to test it against what OpenAI had warned against — it still didn't work. In other words, I tried to stay away from the warning spots. I tried to give it challenging queries similar to what I'd given deep research in the past when it only worked on the open web, and it really fell down on local information.
Because you can see the chain of thought for deep research, I was actually able to pull up that chain of thought during the queries I sent and take screenshots of what deep research said it was doing. And I learned some fascinating things.
It turns out that the API result it is relying on to get results from Calendar, to get results from Gmail, tops out at 15. In other words, if you want to say — as you would to a hardworking executive assistant — "Please do a comprehensive analysis of last month's email volume. Cohort it out. Tell me who I need to be focused on. Tell me how I can use my time more efficiently. Give me a sense of the types of emails I need to respond to and the ones I don't" — it can't do that. It has access, but because of the thinness of the data pipe it's working with, it is absolutely impossible.
I tried to do an email analysis of the last 100 emails. I tried to do a calendar analysis of the last 100 calendar entries. And this is me being kind — I would have said a thousand if I could, but I had a feeling. I tried to do an analysis of the last 100 Google Docs that I created. It does extremely limited searching. In my query for 100 docs, I found evidence of it calling back three. In my query for 100 emails, it could not produce exact counts — it just kind of waved its hand in the air and gave me approximate numbers. And as someone who gets the email every day, I knew that it had correctly guessed categories, but it had wildly incorrectly guessed numbers. The numbers it was guessing for whole groups of my email were completely off, and it just didn't do the groundwork of actually checking the email, even though the data connector was there.
So it just failed. We don't have to call it anything else.
Where Data Connectors Actually Work
Where did it succeed? Well, I gave it a specific topic to look into and it did much better there. If you give it, say, a webinar you're planning, or an event you want to do — something that has a defined time focus — and you say, "Look across the web, look across my email, my calendar, give me a comprehensive briefing for just this very tight topic that's clearly delineated by keyword," then it does pretty well. It can use that keyword as a guidepost across Gmail, across Calendar, across the open web, across your Google Docs, and come back with something that is actually a decently good comprehensive briefing.
How does it do that well? Because each individual data source is not going to be a large number of individual units of data — it's not going to be more than 15 in many cases — and it can then assemble that out into something really comprehensive by inferring and reasoning across all of them together, which o3 does very well. It also helps if that event has a public presence, because then it can do what deep research does best: reason across the entire web at scale.
The Bigger Picture: Competition and Training Data
So when I step back and look at this, I see it in the context of ongoing competition — both between model makers, between Anthropic and OpenAI, which I laid out at the beginning, and also between model makers and specific verticals they want to go after.
One of the questions I got as this came out was: "Is anything safe? They keep going after these verticals. Who's next? Did they eat Granola?" — because one of the data sources here is that Teams will now record your calls.
To be honest, I think the common rationale across these recent moves by both Anthropic and OpenAI is all about tokens and data in. It's all about training data. Everybody's hungry for it. They're building connections to Gmail for training data, to Calendar for training data, anything they can get. They're building the meeting transcripts piece for training data. Anthropic is cutting off access to models in Windsurf — which was acquired by OpenAI — in order to cut OpenAI off from training data. Now, it doesn't really matter in practice because Windsurf can get third-party access instead of first-party access. But the intent is to cut their rival off from training data, to keep them from getting it.
So if you're building in this space, the question you should be asking yourself is: how easy is it for a model maker to get access to training data that they would find high value — data that would have real rewards if they got it right? My proposal is to look for places where that data would be hard to get, where they couldn't just add an MCP server and get it, because that is basically what OpenAI did here.
And I think the question is not whether they would add an MCP server to get the data if they could — it's whether collecting the data is in line with their larger stated vision for the company. Because, throwing elbows between OpenAI and Anthropic aside, this is right in line with what we would expect. OpenAI has been really clear about their plan to be the default OS for the enterprise. If you're going to be the default operating system for work, you've got to do meetings. Shouldn't have surprised anybody. You've got to do Gmail, Calendar, Outlook, SharePoint. This isn't that surprising.
The Operator Parallel: Expect Improvement Over Time
Now, are they doing it well enough that I would trust them with all of that at the enterprise level right now? No, they're not. In fact, this is reminding me a lot of the initial release of Operator back in January. When Operator came out, it was frankly pretty terrible. I used it about three or four times and it was inaccurate, it was slow, it took forever, it froze up on easy things like adding to cart. I just stopped using it.
I had a feeling they would make it better, because after all it was in beta. And last week they did. They added their o3 model as the driving model behind Operator. Operator got about ten times faster and ten times more accurate. It's actually a useful tool now. They didn't make as big a fanfare out of it, but I now find that I actually go to it and imagine specific use cases because it is fast enough. I used it for a real task just this week that I did not pre-plan. If you're wondering, it was flight planning. And yes, I know they trained for flight planning, but it wasn't even usable for flight planning in January. So it's not just that they trained for it — it's that they got the right model behind it. They took the time, they probably collected data anonymously from users as we were using it, and they were eventually able to make the model better at browsing the internet.
I expect the same thing to happen with this Connectors play. This is a long-term play. I don't consider it particularly usable today. I consider the move toward data for the enterprise and workplace something that is not surprising, and something where OpenAI is going to get much better over the next six months. They need more data to reason across.
The Deeper Challenge: Messy Data and Precise Prompting
Now, this doesn't mean they will immediately become ten times more intelligent at handling all of that data. I do think there are real questions about the ability of these models to reason well across very large data streams that are incredibly messy and created by humans. Have you looked in your Notion or your wiki at work? It's probably pretty dirty. Everyone kind of rolls their eyes and throws stuff in there. When you have a dirty repository of unstructured text data like that, it is inherently not a great place to ask the AI to start making meaning. And yet that is often the case for a lot of our text repositories.
So my challenge to you is: how good is your prompting, and how precisely are you asking for work to be done? One of the characteristics we are learning about AI in 2025 is that when you prompt well — when you prompt cleanly and clearly with a very specific task in mind — you often get surprisingly good results. But when you ask for something fuzzier, harder, more like you would ask a very senior researcher where you don't even fully know the question, you often get quite poor results.
That really explains how my experience went with Connectors today. I was able to get good results on my specific query around events and webinars — understanding how they assembled into a briefing, getting a sense of the emails and the calendar and the agenda and the public profile for a specific event. That went well. It was a very specific query. My more generalized discovery questions around patterns in my email and patterns in my calendar went terribly. It just did not go well at all, and it could not pull the volume of data needed to make sense of it.
So I'm taking away from that that part of the challenge for work here in 2026 is: how can we more effectively structure our queries so we precisely ask what we intend? That takes a lot of human work. It's not easy, but I think that's a big work skill that we can all stand to get better at. Clearly I could stand to get better at it, because I batted about one for three, one for four on my test queries today. I need to figure out ways to use this tool that are more like a scalpel and less like a chainsaw. This is not a chainsaw tool. It's a very precise tool right now. It may get higher bandwidth as they increase the scope of those connectors, but it's not that high bandwidth now.
So that is the skinny on what happened with Data Connectors. There's a lot more behind the scenes of this move that I want to dig deeper on in the next piece.