GPT-5.4
Released March 5, 2026
Pricing
About
GPT-5.4 launched today — March 5, 2026. It is OpenAI’s most capable model, the first with native computer-use capabilities, and the new default across ChatGPT, the API, and Codex. It replaces GPT-5.2 Thinking and GPT-5.3-Codex in a single unified model.
What is new
GPT-5.4 is not an incremental update. It introduces three capabilities that no previous GPT model had:
Native computer use. GPT-5.4 can operate software through screenshots and keyboard/mouse commands — browsing websites, filling forms, navigating desktop environments. On OSWorld-Verified (desktop navigation), it scores 75.0%, surpassing human performance at 72.4%. This is not a plugin or tool — it is built into the model.
Tool search. Previous models loaded all tool definitions into context upfront. GPT-5.4 receives a lightweight index and loads only the definitions it needs at runtime. On Scale’s MCP Atlas benchmark with 36 MCP servers enabled, this reduced token usage by 47% with no accuracy loss. This matters for anyone building agents with large tool ecosystems.
1M token context. The context window extends to 1.05M tokens (up from 272K on GPT-5.2). Prompts exceeding 272K tokens are priced at 2x input / 1.5x output. This enables analysis of entire codebases, long document collections, or extended agent trajectories in a single request.
Native compaction. GPT-5.4 is the first model trained to support compaction — summarizing earlier context to extend effective agent trajectories without losing key information.
Benchmarks
OpenAI published extensive benchmarks comparing GPT-5.4 against GPT-5.3-Codex and GPT-5.2:
Professional work
| Benchmark | GPT-5.4 | GPT-5.3-Codex | GPT-5.2 |
|---|---|---|---|
| GDPval (matches/exceeds professionals) | 83.0% | 70.9% | 70.9% |
| Investment Banking Modeling | 87.3% | 79.3% | 68.4% |
| OfficeQA | 68.1% | 65.1% | 63.1% |
GDPval tests agent performance across 44 occupations — sales presentations, accounting spreadsheets, urgent care schedules, manufacturing diagrams. GPT-5.4 matches or exceeds industry professionals in 83% of comparisons.
Coding
| Benchmark | GPT-5.4 | GPT-5.3-Codex | GPT-5.2 |
|---|---|---|---|
| SWE-Bench Pro (Public) | 57.7% | 56.8% | 55.6% |
| Terminal-Bench 2.0 | 75.1% | 77.3% | 62.2% |
GPT-5.4 matches GPT-5.3-Codex on coding while being lower latency. Codex /fast mode delivers 1.5x faster token velocity.
Computer use and vision
| Benchmark | GPT-5.4 | GPT-5.2 | Human |
|---|---|---|---|
| OSWorld-Verified | 75.0% | 47.3% | 72.4% |
| WebArena-Verified | 67.3% | 65.4% | — |
| Online-Mind2Web | 92.8% | — | — |
| MMMU-Pro (no tools) | 81.2% | 79.5% | — |
The jump from 47.3% to 75.0% on OSWorld is the headline number — GPT-5.4 now outperforms humans on desktop navigation tasks.
Tool use
| Benchmark | GPT-5.4 | GPT-5.2 |
|---|---|---|
| BrowseComp | 82.7% | 65.8% |
| MCP Atlas | 67.2% | 60.6% |
| Toolathlon | 54.6% | 45.7% |
Abstract reasoning
| Benchmark | GPT-5.4 | GPT-5.2 |
|---|---|---|
| ARC-AGI-1 (Verified) | 93.7% | 86.2% |
| ARC-AGI-2 (Verified) | 73.3% | 52.9% |
| Frontier Science Research | 33.0% | 25.2% |
| FrontierMath Tier 1–3 | 47.6% | 40.7% |
| Humanity’s Last Exam (with tools) | 52.1% | 45.5% |
ARC-AGI-2 going from 52.9% to 73.3% is a 20-point jump — the largest improvement in abstract reasoning between any two consecutive GPT releases.
Factuality
GPT-5.4 is OpenAI’s most factual model:
- Individual claims are 33% less likely to be false vs GPT-5.2
- Full responses are 18% less likely to contain any errors
Pricing
| Model | Input | Cached Input | Output |
|---|---|---|---|
| GPT-5.4 | $2.50/M | $0.25/M | $15.00/M |
| GPT-5.4 Pro | $30.00/M | — | $180.00/M |
| GPT-5.2 | $1.75/M | $0.175/M | $14.00/M |
GPT-5.4 is priced ~43% higher on input than GPT-5.2, but OpenAI claims greater token efficiency offsets this — the model uses fewer tokens to solve equivalent problems.
Batch and Flex pricing available at half rate. Priority processing at 2x rate. Context >272K tokens priced at 2x input / 1.5x output.
API access
from openai import OpenAI
client = OpenAI()
response = client.responses.create(
model="gpt-5.4",
input="Analyze this codebase for security vulnerabilities.",
reasoning={"effort": "medium"},
text={"verbosity": "medium"}
)
Reasoning effort levels: none (default), low, medium, high, xhigh Verbosity levels: low, medium (default), high
Computer use requires the updated computer tool in the Responses API. Tool search uses tool_search capability with deferred loading.
Variants
| Variant | Best for | Context |
|---|---|---|
| gpt-5.4 | General-purpose: reasoning, coding, agentic tasks | 1.05M |
| gpt-5.4-pro | Maximum performance on complex problems | 1.05M |
| gpt-5-mini | Cost-optimized reasoning and chat | — |
| gpt-5-nano | High-throughput classification and instruction following | — |
Steerability
New in GPT-5.4 Thinking (ChatGPT): the model shows an upfront preamble of its plan before executing, so users can adjust course mid-response. This reduces back-and-forth — you see where it is headed and can redirect before it finishes.
Safety
OpenAI rates GPT-5.4 as High cyber capability under their Preparedness Framework — the first general-purpose model with this classification. Deployed with:
- Expanded cyber safety stack with monitoring systems
- Trusted access controls
- Asynchronous blocking for higher-risk requests on Zero Data Retention surfaces
- Chain-of-Thought controllability testing (model cannot easily hide its reasoning)
Timeline
GPT-5.2 Thinking remains available in ChatGPT for 3 months under Legacy Models. Retirement date: June 5, 2026.
Availability
- ChatGPT: Rolling out now to Plus, Team, Pro users as GPT-5.4 Thinking
- Enterprise/Edu: Available via admin early access settings
- API: Available now as
gpt-5.4andgpt-5.4-pro - Codex: Available with experimental 1M context support
- OpenRouter: Available as
openai/gpt-5.4
Open-source alternatives
GPT-5.4 is proprietary with no public weights. For self-hosted alternatives:
- Qwen3.5-397B-A17B (Apache 2.0) — 403B MoE, 17B active, open-weight frontier reasoning
- DeepSeek V3.2 (MIT) — 685B MoE, strong reasoning + chat at 1/40th the price
- GLM 5 (MIT) — 754B params, Z.ai’s open-weight flagship
- LiquidAI LFM2-24B-A2B — novel architecture, efficient inference on consumer hardware
References
- HuggingFace huggingface.co/openai/GPT-5.4
- Blog openai.com/index/introducing-gpt-5-4/
- Docs developers.openai.com/api/docs/guides/latest-model
- Grokipedia grokipedia.com/page/GPT-5