Google DeepMind proprietary

Gemini 3.1 Flash Lite Preview

Released February 1, 2026

Context Window 1.0M tokens

≈ 787 pages of text

Pricing

Input $0.25/M per million tokens

Output $1.50/M per million tokens

About

Gemini 3.1 Flash Lite Preview

Google DeepMind’s high-efficiency model for volume-heavy workloads. Flash Lite sits at #2 on the OpenRouter usage leaderboard — behind only GPT-5.3 Chat — processing millions of daily API requests. It delivers reasoning-class performance at a fraction of the cost of larger models.

What It Does

Flash Lite targets the gap between cheap-and-dumb and expensive-and-smart. It accepts text, image, video, file, and audio inputs, outputs text, and supports flexible thinking levels (minimal, low, medium, high) so you can tune the cost-intelligence tradeoff per request.

Google positions it as a direct upgrade from Gemini 2.5 Flash Lite with improvements across audio/ASR, RAG snippet ranking, translation, data extraction, and code completion. Priced at half the cost of Gemini 3 Flash.

Key Capabilities

Flexible reasoning levels — Select minimal through high thinking budgets per request
Multimodal input — Text, images, video, audio, and files in a single prompt
1M token context — Full million-token input window with 64K output
Tool use — Function calling, structured output, search grounding, code execution
363 tokens/sec output speed — Faster than GPT-5 mini (71 t/s), Claude 4.5 Haiku (108 t/s), and Grok 4.1 Fast (145 t/s)

Benchmarks

Flash Lite punches well above its weight class. At high thinking level:

Benchmark	Flash Lite 3.1	Gemini 2.5 Flash	GPT-5 mini	Claude 4.5 Haiku	Grok 4.1 Fast
Humanity’s Last Exam	16.0%	11.0%	16.7%	9.7%	17.6%
GPQA Diamond (science)	86.9%	82.8%	82.3%	73.0%	84.3%
MMMU-Pro (multimodal)	76.8%	66.7%	74.1%	58.0%	63.0%
CharXiv Reasoning (charts)	73.2%	63.7%	75.5%*	61.7%	31.6%
Video-MMMU	84.8%	79.2%	82.5%	—	74.6%
SimpleQA Verified (knowledge)	43.3%	28.1%	9.5%	5.5%	19.5%
MMMLU (multilingual)	88.9%	86.6%	84.9%	83.0%	86.8%
LiveCodeBench (code)	72.0%	62.6%	80.4%	53.2%	76.5%
MRCR v2 128K (long context)	60.1%	54.3%	52.5%	35.3%	54.6%

*GPT-5 mini uses Python for CharXiv. Bold = best in class.

The standout numbers: 86.9% on GPQA Diamond beats every model in this tier, and 43.3% on SimpleQA is 4x better than GPT-5 mini on factual accuracy.

Pricing

One of the cheapest reasoning-capable models available:

Tier	Input (text/image/video)	Input (audio)	Output	Cached Input
Standard	$0.25/M	$0.50/M	$1.50/M	$0.025/M
Batch	$0.125/M	$0.25/M	$0.75/M	—

Via OpenRouter: $0.25/M input, $1.50/M output.

For comparison:

GPT-5 mini: $0.25/M in, $2.00/M out
Claude 4.5 Haiku: $1.00/M in, $5.00/M out
Grok 4.1 Fast: $0.20/M in, $0.50/M out

Flash Lite matches or beats all of these on most benchmarks while staying at the bottom of the price range.

Use Cases

The model excels at high-throughput classification, labeling, and structured extraction. Production users report:

E-commerce — Populating product categories across hundreds of SKUs in seconds
Data labeling — 100% consistency in fashion item tagging (Whering)
Content routing — 94% intent routing accuracy with sub-10-second completions (HubX)
Multimodal labeling — Large-scale image/video annotation at dramatically reduced cost (Cartwheel)

Model Details

Property	Value
Status	Preview
Context window	1M tokens input, 64K output
Modalities	Text, image, video, file, audio → text
Tokenizer	Gemini
Knowledge cutoff	January 2025
Reasoning	Configurable thinking levels
Availability	Google AI Studio, Gemini API, Vertex AI, OpenRouter

Open-Source Alternatives

Flash Lite is proprietary. For self-hosted alternatives in this performance tier:

Qwen3.5-35B-A3B — MoE model using only 3B active params, Apache 2.0
LFM2-24B-A2B (Liquid) — 2B active parameters, runs on consumer hardware
Mistral Small 3 14B — Apache 2.0, fits on a single GPU

References

🤗 HuggingFace huggingface.co/google/gemini-3.1-flash-lite-preview

Pricing

About

Gemini 3.1 Flash Lite Preview

What It Does

Key Capabilities

Benchmarks

Pricing

Use Cases

Model Details

Open-Source Alternatives

Links

References