OpenAI Undisclosed proprietary text-generation

GPT-5.3 Chat

Released March 3, 2026

Context Window 128K tokens

≈ 96 pages of text

Pricing

Input $1.75/M per million tokens

Output $14.00/M per million tokens

About

GPT-5.3 Chat is the most-used model on OpenRouter as of March 2026, sitting at rank #1 by API request volume. Released March 3, 2026, it is the latest refinement in OpenAI’s GPT-5 series — optimized specifically for everyday conversation rather than deep reasoning or coding tasks.

What it does differently

GPT-5.3 Chat is not a capability upgrade over GPT-5.2. It is a behavioral one. OpenAI tuned it to reduce three things that made prior models annoying in practice:

Unnecessary refusals — previous models would decline benign requests out of excessive caution
Hedging and caveats — responses cluttered with “it’s important to note” and “however, please consult a professional”
Hallucination — more grounded responses with better contextualization of claims

The result is a model that feels more direct and less robotic in everyday use, which explains why it jumped to #1 on OpenRouter within days of release.

The GPT-5 family

GPT-5.3 Chat is part of a lineage that started with GPT-5’s launch on August 7, 2025. The family tree:

Variant	Released	Focus
GPT-5	Aug 2025	Base model with unified reasoning router
GPT-5.1-Codex	Late 2025	Agentic coding, 400K context, configurable reasoning
GPT-5.2	Dec 2025	Frontier reasoning, professional workflows
GPT-5.2 Instant	Dec 2025	Fast everyday tasks, warmer tone
GPT-5.3 Chat	Mar 3, 2026	Conversational quality, reduced refusals
GPT-5.3 Instant	Mar 3, 2026	Same improvements, faster variant
GPT-5.4	Mar 5, 2026	Computer-use, enhanced agentic capabilities

Architecture

OpenAI does not disclose GPT-5’s architecture in detail. What is known from the system card and official documentation:

Unified system — a fast base model handles routine queries, while a deeper reasoning component (GPT-5 Thinking) engages for complex multi-step problems
Real-time router — automatically selects processing depth based on query complexity, user intent, and conversation context
Reasoning effort levels — none, low, medium, high — controllable via API parameter (reasoning.effort) or natural language prompts like “think hard about this”
Multimodal — processes text and images, with enhanced chart/diagram interpretation introduced in GPT-5

The thinking mode reduces factual errors by approximately 80% compared to non-thinking mode, according to OpenAI’s evaluations.

Pricing

GPT-5.3 Chat sits in the mid-range of API pricing:

Metric	Value
Input	$1.75 / M tokens
Output	$14.00 / M tokens
Context window	128K tokens

Compared to competitors at similar capability:

Anthropic Claude Sonnet 4.6: $3.00 / $15.00 per M tokens (200K context)
Google Gemini 3.1 Pro: $2.00 / $8.00 per M tokens (1M context)
Qwen3.5-397B-A17B: $0.39 / $1.56 per M tokens (262K context, open-weight)

API access

Available through the OpenAI API and aggregators:

# OpenAI direct
curl https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{"model": "gpt-5.3-chat", "messages": [{"role": "user", "content": "Hello"}]}'

# OpenRouter
curl https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -d '{"model": "openai/gpt-5.3-chat", "messages": [{"role": "user", "content": "Hello"}]}'

Why it is #1

GPT-5.3 Chat reached the top of OpenRouter’s usage rankings not because it is the most capable model — GPT-5.4 and Claude Opus 4.6 outperform it on reasoning benchmarks — but because it is the most pleasant to talk to. For the majority of API calls (support bots, chat interfaces, content generation), behavioral quality matters more than peak reasoning ability.

The model hit rank #1 within two days of release. OpenRouter usage data shows it handling millions of daily requests, significantly ahead of Gemini 3.1 Flash Lite Preview (#2) and ByteDance Seed-2.0-Mini (#3) by volume.

Limitations

Proprietary — no weights available, no self-hosting, no fine-tuning
No reasoning mode control in Chat variant — the Chat variant defaults to non-thinking mode; use GPT-5.3 or GPT-5.4 for configurable reasoning effort
128K context — shorter than Gemini (1M) and Claude (200K–1M)
Price — $14/M output tokens is expensive compared to open alternatives like Qwen3.5 ($1.56/M) or DeepSeek V3.2 ($0.88/M)

Open-source alternatives

For users who need similar conversational quality without vendor lock-in:

Qwen3.5-35B-A3B (Apache 2.0) — rank #5, MoE with only 3B active params, runs on consumer hardware
LiquidAI LFM2-24B-A2B (restricted) — rank #9, novel architecture, 32K context
DeepSeek V3.2 (MIT) — rank #77, 685B MoE, strong reasoning + chat
Mistral Ministral 3 14B (Apache 2.0) — rank #70, compact and fast

References

🤗 HuggingFace huggingface.co/openai/GPT-5.3-Chat
📝 Blog openai.com/index/introducing-gpt-5/
📖 Docs platform.openai.com/docs/models
🔍 Grokipedia grokipedia.com/page/GPT-5