OpenAI Undisclosed proprietary text-generation

GPT-5.3 Chat

Released March 3, 2026

Context Window 128K tokens
≈ 96 pages of text

Pricing

Input $1.75/M per million tokens
Output $14.00/M per million tokens

About

GPT-5.3 Chat is the most-used model on OpenRouter as of March 2026, sitting at rank #1 by API request volume. Released March 3, 2026, it is the latest refinement in OpenAI’s GPT-5 series — optimized specifically for everyday conversation rather than deep reasoning or coding tasks.

What it does differently

GPT-5.3 Chat is not a capability upgrade over GPT-5.2. It is a behavioral one. OpenAI tuned it to reduce three things that made prior models annoying in practice:

  • Unnecessary refusals — previous models would decline benign requests out of excessive caution
  • Hedging and caveats — responses cluttered with “it’s important to note” and “however, please consult a professional”
  • Hallucination — more grounded responses with better contextualization of claims

The result is a model that feels more direct and less robotic in everyday use, which explains why it jumped to #1 on OpenRouter within days of release.

The GPT-5 family

GPT-5.3 Chat is part of a lineage that started with GPT-5’s launch on August 7, 2025. The family tree:

VariantReleasedFocus
GPT-5Aug 2025Base model with unified reasoning router
GPT-5.1-CodexLate 2025Agentic coding, 400K context, configurable reasoning
GPT-5.2Dec 2025Frontier reasoning, professional workflows
GPT-5.2 InstantDec 2025Fast everyday tasks, warmer tone
GPT-5.3 ChatMar 3, 2026Conversational quality, reduced refusals
GPT-5.3 InstantMar 3, 2026Same improvements, faster variant
GPT-5.4Mar 5, 2026Computer-use, enhanced agentic capabilities

Architecture

OpenAI does not disclose GPT-5’s architecture in detail. What is known from the system card and official documentation:

  • Unified system — a fast base model handles routine queries, while a deeper reasoning component (GPT-5 Thinking) engages for complex multi-step problems
  • Real-time router — automatically selects processing depth based on query complexity, user intent, and conversation context
  • Reasoning effort levels — none, low, medium, high — controllable via API parameter (reasoning.effort) or natural language prompts like “think hard about this”
  • Multimodal — processes text and images, with enhanced chart/diagram interpretation introduced in GPT-5

The thinking mode reduces factual errors by approximately 80% compared to non-thinking mode, according to OpenAI’s evaluations.

Pricing

GPT-5.3 Chat sits in the mid-range of API pricing:

MetricValue
Input$1.75 / M tokens
Output$14.00 / M tokens
Context window128K tokens

Compared to competitors at similar capability:

  • Anthropic Claude Sonnet 4.6: $3.00 / $15.00 per M tokens (200K context)
  • Google Gemini 3.1 Pro: $2.00 / $8.00 per M tokens (1M context)
  • Qwen3.5-397B-A17B: $0.39 / $1.56 per M tokens (262K context, open-weight)

API access

Available through the OpenAI API and aggregators:

# OpenAI direct
curl https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{"model": "gpt-5.3-chat", "messages": [{"role": "user", "content": "Hello"}]}'

# OpenRouter
curl https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -d '{"model": "openai/gpt-5.3-chat", "messages": [{"role": "user", "content": "Hello"}]}'

Why it is #1

GPT-5.3 Chat reached the top of OpenRouter’s usage rankings not because it is the most capable model — GPT-5.4 and Claude Opus 4.6 outperform it on reasoning benchmarks — but because it is the most pleasant to talk to. For the majority of API calls (support bots, chat interfaces, content generation), behavioral quality matters more than peak reasoning ability.

The model hit rank #1 within two days of release. OpenRouter usage data shows it handling millions of daily requests, significantly ahead of Gemini 3.1 Flash Lite Preview (#2) and ByteDance Seed-2.0-Mini (#3) by volume.

Limitations

  • Proprietary — no weights available, no self-hosting, no fine-tuning
  • No reasoning mode control in Chat variant — the Chat variant defaults to non-thinking mode; use GPT-5.3 or GPT-5.4 for configurable reasoning effort
  • 128K context — shorter than Gemini (1M) and Claude (200K–1M)
  • Price — $14/M output tokens is expensive compared to open alternatives like Qwen3.5 ($1.56/M) or DeepSeek V3.2 ($0.88/M)

Open-source alternatives

For users who need similar conversational quality without vendor lock-in:

  • Qwen3.5-35B-A3B (Apache 2.0) — rank #5, MoE with only 3B active params, runs on consumer hardware
  • LiquidAI LFM2-24B-A2B (restricted) — rank #9, novel architecture, 32K context
  • DeepSeek V3.2 (MIT) — rank #77, 685B MoE, strong reasoning + chat
  • Mistral Ministral 3 14B (Apache 2.0) — rank #70, compact and fast

References

  • 🤗 HuggingFace huggingface.co/openai/GPT-5.3-Chat
  • 📝 Blog openai.com/index/introducing-gpt-5/
  • 📖 Docs platform.openai.com/docs/models
  • 🔍 Grokipedia grokipedia.com/page/GPT-5