Google DeepMind proprietary

Gemini 3.1 Flash Lite Preview

Released February 1, 2026

Context Window 1.0M tokens
≈ 787 pages of text

Pricing

Input $0.25/M per million tokens
Output $1.50/M per million tokens

About

Gemini 3.1 Flash Lite Preview

Google DeepMind’s high-efficiency model for volume-heavy workloads. Flash Lite sits at #2 on the OpenRouter usage leaderboard — behind only GPT-5.3 Chat — processing millions of daily API requests. It delivers reasoning-class performance at a fraction of the cost of larger models.

What It Does

Flash Lite targets the gap between cheap-and-dumb and expensive-and-smart. It accepts text, image, video, file, and audio inputs, outputs text, and supports flexible thinking levels (minimal, low, medium, high) so you can tune the cost-intelligence tradeoff per request.

Google positions it as a direct upgrade from Gemini 2.5 Flash Lite with improvements across audio/ASR, RAG snippet ranking, translation, data extraction, and code completion. Priced at half the cost of Gemini 3 Flash.

Key Capabilities

  • Flexible reasoning levels — Select minimal through high thinking budgets per request
  • Multimodal input — Text, images, video, audio, and files in a single prompt
  • 1M token context — Full million-token input window with 64K output
  • Tool use — Function calling, structured output, search grounding, code execution
  • 363 tokens/sec output speed — Faster than GPT-5 mini (71 t/s), Claude 4.5 Haiku (108 t/s), and Grok 4.1 Fast (145 t/s)

Benchmarks

Flash Lite punches well above its weight class. At high thinking level:

BenchmarkFlash Lite 3.1Gemini 2.5 FlashGPT-5 miniClaude 4.5 HaikuGrok 4.1 Fast
Humanity’s Last Exam16.0%11.0%16.7%9.7%17.6%
GPQA Diamond (science)86.9%82.8%82.3%73.0%84.3%
MMMU-Pro (multimodal)76.8%66.7%74.1%58.0%63.0%
CharXiv Reasoning (charts)73.2%63.7%75.5%*61.7%31.6%
Video-MMMU84.8%79.2%82.5%74.6%
SimpleQA Verified (knowledge)43.3%28.1%9.5%5.5%19.5%
MMMLU (multilingual)88.9%86.6%84.9%83.0%86.8%
LiveCodeBench (code)72.0%62.6%80.4%53.2%76.5%
MRCR v2 128K (long context)60.1%54.3%52.5%35.3%54.6%

*GPT-5 mini uses Python for CharXiv. Bold = best in class.

The standout numbers: 86.9% on GPQA Diamond beats every model in this tier, and 43.3% on SimpleQA is 4x better than GPT-5 mini on factual accuracy.

Pricing

One of the cheapest reasoning-capable models available:

TierInput (text/image/video)Input (audio)OutputCached Input
Standard$0.25/M$0.50/M$1.50/M$0.025/M
Batch$0.125/M$0.25/M$0.75/M

Via OpenRouter: $0.25/M input, $1.50/M output.

For comparison:

  • GPT-5 mini: $0.25/M in, $2.00/M out
  • Claude 4.5 Haiku: $1.00/M in, $5.00/M out
  • Grok 4.1 Fast: $0.20/M in, $0.50/M out

Flash Lite matches or beats all of these on most benchmarks while staying at the bottom of the price range.

Use Cases

The model excels at high-throughput classification, labeling, and structured extraction. Production users report:

  • E-commerce — Populating product categories across hundreds of SKUs in seconds
  • Data labeling — 100% consistency in fashion item tagging (Whering)
  • Content routing — 94% intent routing accuracy with sub-10-second completions (HubX)
  • Multimodal labeling — Large-scale image/video annotation at dramatically reduced cost (Cartwheel)

Model Details

PropertyValue
StatusPreview
Context window1M tokens input, 64K output
ModalitiesText, image, video, file, audio → text
TokenizerGemini
Knowledge cutoffJanuary 2025
ReasoningConfigurable thinking levels
AvailabilityGoogle AI Studio, Gemini API, Vertex AI, OpenRouter

Open-Source Alternatives

Flash Lite is proprietary. For self-hosted alternatives in this performance tier:

  • Qwen3.5-35B-A3B — MoE model using only 3B active params, Apache 2.0
  • LFM2-24B-A2B (Liquid) — 2B active parameters, runs on consumer hardware
  • Mistral Small 3 14B — Apache 2.0, fits on a single GPU

References

  • 🤗 HuggingFace huggingface.co/google/gemini-3.1-flash-lite-preview