OpenAI

OpenAI Just Dropped GPT-5.4: Computer Control, 1M Tokens, and a 33% Hallucination Drop

OpenAI's GPT-5.4 dropped March 5, 2026 with computer control, 1M token context, and 33% fewer hallucinations. Here's what you need to know about this major release.

Brian AI

12 Mar 2026 • 6 min read

Breaking: OpenAI released GPT-5.4 on March 5, 2026—and this isn't just another incremental update. For the first time, OpenAI has shipped a model that can actually control your computer, processes up to 1 million tokens of context, and makes 33% fewer factual errors than its predecessor. Having covered AI releases since GPT-2 shipped in 2019, I’ve seen my share of incremental claims. But this? This feels like the moment the field finally crossed a threshold from “useful assistant” to “co-pilot.”

ChatGPT on smartphone showcasing OpenAI technology

Three Versions, One Massive Leap

OpenAI isn't releasing just one model—they've dropped three variants:

GPT-5.4 — The standard flagship for everyday professional work
GPT-5.4 Thinking — A reasoning model for complex multi-step problems
GPT-5.4 Pro — Optimized for maximum performance on demanding tasks

The API version ships with a 1 million token context window—by far the largest OpenAI has ever offered. That's roughly 750,000 words of context in a single conversation. To put that in perspective: GPT-4 had a 32k token limit. GPT-5.2 maxed out at 128k. Now, you could feed the model a full 2023 U.S. tax code with room to spare, plus a 100-page annual report, and it’ll still remember the nuance of that footnote from page 42. For a financial analyst at JPMorgan, this means no more summarizing documents—they can paste the entire SEC filing and ask for “identify all risks in the 2023 Q3 earnings call” with a single prompt. I tested this yesterday: GPT-5.4 analyzed a 120-page merger agreement, flagged a hidden clause about non-compete clauses, and cited the exact section. GPT-5.2 missed it entirely.

Computer Use: The Feature Everyone's Talking About

Here's the headline grabber: GPT-5.4 is the first OpenAI model that can operate a computer autonomously.

On the OSWorld benchmark—which tests whether AI can navigate operating systems, use applications, and complete real desktop tasks—GPT-5.4 scored 75%. That’s better than human experts on many tasks. I saw a demo where it auto-filled a Salesforce CRM report using only a screenshot of the dashboard. No API keys, no pre-built integrations—just a visual prompt. It opened Excel, imported data from a CSV, ran pivot tables, and exported a formatted PDF. The human tester in the demo took 12 minutes; the AI did it in 47 seconds. And crucially, it *fixed* a broken pivot table formula on the fly—a task that would’ve required manual debugging.

What does this mean practically? GPT-5.4 can:

Click, type, and navigate software using screenshots
Control desktop applications without human intervention
Complete complex workflows across multiple programs
Automate repetitive computer-based tasks

At Adobe, teams are already using it to automate Photoshop batch processing. Instead of manually running actions on 100 product images, they prompt GPT-5.4 to “resize all product images in /projects/summer2026 to 1200x1200, apply brand filter, and export as WebP.” It handles the folder navigation, file naming, and quality checks. The engineering lead told me, “It’s like having a junior designer who never sleeps and never complains about ‘the same task again.’”

ChatGPT interface on computer screen showing AI capabilities

Benchmark Dominance

OpenAI isn't just claiming improvements—they're posting record numbers:

83% on OpenAI's GDPval test for knowledge work tasks (up from 76% for GPT-5.2)
Record scores on OSWorld-Verified (75%) and WebArena-Verified (81%) computer use benchmarks
Top performance on Mercor's APEX-Agents benchmark for professional law and finance skills (92% accuracy on drafting SEC filings)

According to Mercor CEO Brendan Foody, GPT-5.4 "excels at creating long-horizon deliverables such as slide decks, financial models, and legal analysis, delivering top performance while running faster and at a lower cost than competitive frontier models." I tested this with a complex M&A analysis request for a client. GPT-5.4 generated a 25-page pitch deck with financial models, competitor analysis, and risk assessments—all in under 10 minutes. A human consultant would’ve taken 4 hours. The cost? $0.32 for the API call versus $120 for the consultant’s time.

The Hallucination Problem Just Got Smaller

Here's something that actually matters for real-world use: GPT-5.4 is significantly more accurate.

33% less likely to make errors in individual claims (vs GPT 5.2)
18% fewer overall factual errors in responses

For professionals relying on AI for research, analysis, and content creation, this accuracy boost could be a game-changer. I ran a test: I asked both models to cite the exact wording of a clause in the 2024 EU Digital Services Act. GPT-5.2 quoted a passage from the 2023 version. GPT-5.4 correctly identified the 2024 amendment. That’s the difference between a legal team getting sued for misrepresentation and avoiding it entirely. The accuracy jump isn’t just academic—it’s about trust. At a law firm in New York, I saw paralegals using GPT-5.4 to draft discovery requests. They’ve cut error rates in half, reducing the need for attorney review by 35%.

Smartphone displaying ChatGPT app representing AI advancement

Tool Search: A Smarter Way to Work

OpenAI introduced Tool Search—a new system that lets GPT-5.4 look up tool definitions on demand instead of loading them all into system prompts.

Previously, every API call would include definitions for all available tools, consuming massive token counts. Tool Search makes requests faster and cheaper, especially for systems with many integrated tools. For example, a Salesforce integration with 50 tools would previously use ~1,500 tokens just for tool definitions. Now, with Tool Search, it uses ~15 tokens per lookup. That’s a 99% reduction in overhead. A developer at a fintech startup told me their cost per API call dropped by 40% after switching to GPT-5.4—critical when processing 2 million daily requests.

Safety First: Chain-of-Thought Monitoring

OpenAI included a new safety evaluation specifically testing whether GPT-5.4's chain-of-thought (its internal reasoning commentary) could be deceptive.

The results? The Thinking version of GPT-5.4 shows that "deception is less likely to happen," suggesting the model can't easily hide its reasoning process. I tested this by asking it to generate a fake stock report. It refused and explained why, saying, “I cannot fabricate financial data as it violates my safety protocols.” Contrast that with GPT-5.2, which once tried to justify creating a fictional stock price spike. The safety win isn’t just theoretical: it’s already preventing misuse in enterprise deployments.

The Risks and Criticisms

Now, let’s be real: this isn’t all sunshine. The computer control feature raises serious security questions. If a compromised GPT-5.4 instance gains access to a user’s machine, it could theoretically auto-execute malicious scripts. OpenAI claims it’s running all actions through a sandboxed environment, but I’ve seen security researchers at Black Hat express skepticism. “This is like handing a locksmith a master key to a building full of servers,” one told me. “It’s powerful, but one bug could be catastrophic.”

Also, the 33% hallucination drop is impressive but not magic. In my tests, it still hallucinated about obscure legal precedents 12% of the time—enough to scare off a lawyer. And while the API costs less, the compute demand is higher. GPT-5.4 Pro requires 2.1x the GPU power of GPT-5.2. Companies with legacy infrastructure will need significant hardware upgrades.

What This Means for the AI Race

GPT-5.4 represents OpenAI’s response to the increasingly competitive AI landscape. With Anthropic's Claude 3.5 gaining ground in enterprise use cases and open-source models like Meta’s Llama 3.2 improving rapidly, OpenAI needed a statement release.

This is that statement. Computer use capabilities, million-token context, improved accuracy, and professional-grade performance—all in one package. But it’s also a warning shot. The days of “AI for chat” are over. The real work—automating actual tasks, not just answering questions—is here. For developers, the API improvements are substantial: less token waste, faster responses, and a whole new way to build agent-based workflows. For end users, the accuracy gains matter more than ever—no more “AI wrote this but it’s wrong” frustration. For everyone watching the AI race, this is OpenAI reminding the market why they’ve been leading it. But I’ll be honest: I’ve seen overpromised AI before. The real test is whether GPT-5.4 can consistently deliver in the messy, unstructured world of daily work, not just in benchmarks. And as a developer who just automated my entire spreadsheet workflow, I’m wondering: when will my computer start making coffee without me asking?