GPT-5.3 Chat
Released March 3, 2026
Pricing
About
GPT-5.3 Chat is the most-used model on OpenRouter as of March 2026, sitting at rank #1 by API request volume. Released March 3, 2026, it is the latest refinement in OpenAI’s GPT-5 series — optimized specifically for everyday conversation rather than deep reasoning or coding tasks.
What it does differently
GPT-5.3 Chat is not a capability upgrade over GPT-5.2. It is a behavioral one. OpenAI tuned it to reduce three things that made prior models annoying in practice:
- Unnecessary refusals — previous models would decline benign requests out of excessive caution
- Hedging and caveats — responses cluttered with “it’s important to note” and “however, please consult a professional”
- Hallucination — more grounded responses with better contextualization of claims
The result is a model that feels more direct and less robotic in everyday use, which explains why it jumped to #1 on OpenRouter within days of release.
The GPT-5 family
GPT-5.3 Chat is part of a lineage that started with GPT-5’s launch on August 7, 2025. The family tree:
| Variant | Released | Focus |
|---|---|---|
| GPT-5 | Aug 2025 | Base model with unified reasoning router |
| GPT-5.1-Codex | Late 2025 | Agentic coding, 400K context, configurable reasoning |
| GPT-5.2 | Dec 2025 | Frontier reasoning, professional workflows |
| GPT-5.2 Instant | Dec 2025 | Fast everyday tasks, warmer tone |
| GPT-5.3 Chat | Mar 3, 2026 | Conversational quality, reduced refusals |
| GPT-5.3 Instant | Mar 3, 2026 | Same improvements, faster variant |
| GPT-5.4 | Mar 5, 2026 | Computer-use, enhanced agentic capabilities |
Architecture
OpenAI does not disclose GPT-5’s architecture in detail. What is known from the system card and official documentation:
- Unified system — a fast base model handles routine queries, while a deeper reasoning component (GPT-5 Thinking) engages for complex multi-step problems
- Real-time router — automatically selects processing depth based on query complexity, user intent, and conversation context
- Reasoning effort levels — none, low, medium, high — controllable via API parameter (
reasoning.effort) or natural language prompts like “think hard about this” - Multimodal — processes text and images, with enhanced chart/diagram interpretation introduced in GPT-5
The thinking mode reduces factual errors by approximately 80% compared to non-thinking mode, according to OpenAI’s evaluations.
Pricing
GPT-5.3 Chat sits in the mid-range of API pricing:
| Metric | Value |
|---|---|
| Input | $1.75 / M tokens |
| Output | $14.00 / M tokens |
| Context window | 128K tokens |
Compared to competitors at similar capability:
- Anthropic Claude Sonnet 4.6: $3.00 / $15.00 per M tokens (200K context)
- Google Gemini 3.1 Pro: $2.00 / $8.00 per M tokens (1M context)
- Qwen3.5-397B-A17B: $0.39 / $1.56 per M tokens (262K context, open-weight)
API access
Available through the OpenAI API and aggregators:
# OpenAI direct
curl https://api.openai.com/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{"model": "gpt-5.3-chat", "messages": [{"role": "user", "content": "Hello"}]}'
# OpenRouter
curl https://openrouter.ai/api/v1/chat/completions \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-d '{"model": "openai/gpt-5.3-chat", "messages": [{"role": "user", "content": "Hello"}]}'
Why it is #1
GPT-5.3 Chat reached the top of OpenRouter’s usage rankings not because it is the most capable model — GPT-5.4 and Claude Opus 4.6 outperform it on reasoning benchmarks — but because it is the most pleasant to talk to. For the majority of API calls (support bots, chat interfaces, content generation), behavioral quality matters more than peak reasoning ability.
The model hit rank #1 within two days of release. OpenRouter usage data shows it handling millions of daily requests, significantly ahead of Gemini 3.1 Flash Lite Preview (#2) and ByteDance Seed-2.0-Mini (#3) by volume.
Limitations
- Proprietary — no weights available, no self-hosting, no fine-tuning
- No reasoning mode control in Chat variant — the Chat variant defaults to non-thinking mode; use GPT-5.3 or GPT-5.4 for configurable reasoning effort
- 128K context — shorter than Gemini (1M) and Claude (200K–1M)
- Price — $14/M output tokens is expensive compared to open alternatives like Qwen3.5 ($1.56/M) or DeepSeek V3.2 ($0.88/M)
Open-source alternatives
For users who need similar conversational quality without vendor lock-in:
- Qwen3.5-35B-A3B (Apache 2.0) — rank #5, MoE with only 3B active params, runs on consumer hardware
- LiquidAI LFM2-24B-A2B (restricted) — rank #9, novel architecture, 32K context
- DeepSeek V3.2 (MIT) — rank #77, 685B MoE, strong reasoning + chat
- Mistral Ministral 3 14B (Apache 2.0) — rank #70, compact and fast
References
- HuggingFace huggingface.co/openai/GPT-5.3-Chat
- Blog openai.com/index/introducing-gpt-5/
- Docs platform.openai.com/docs/models
- Grokipedia grokipedia.com/page/GPT-5