Gemini 3.1 Flash Lite Preview
Released February 1, 2026
Pricing
About
Gemini 3.1 Flash Lite Preview
Google DeepMind’s high-efficiency model for volume-heavy workloads. Flash Lite sits at #2 on the OpenRouter usage leaderboard — behind only GPT-5.3 Chat — processing millions of daily API requests. It delivers reasoning-class performance at a fraction of the cost of larger models.
What It Does
Flash Lite targets the gap between cheap-and-dumb and expensive-and-smart. It accepts text, image, video, file, and audio inputs, outputs text, and supports flexible thinking levels (minimal, low, medium, high) so you can tune the cost-intelligence tradeoff per request.
Google positions it as a direct upgrade from Gemini 2.5 Flash Lite with improvements across audio/ASR, RAG snippet ranking, translation, data extraction, and code completion. Priced at half the cost of Gemini 3 Flash.
Key Capabilities
- Flexible reasoning levels — Select minimal through high thinking budgets per request
- Multimodal input — Text, images, video, audio, and files in a single prompt
- 1M token context — Full million-token input window with 64K output
- Tool use — Function calling, structured output, search grounding, code execution
- 363 tokens/sec output speed — Faster than GPT-5 mini (71 t/s), Claude 4.5 Haiku (108 t/s), and Grok 4.1 Fast (145 t/s)
Benchmarks
Flash Lite punches well above its weight class. At high thinking level:
| Benchmark | Flash Lite 3.1 | Gemini 2.5 Flash | GPT-5 mini | Claude 4.5 Haiku | Grok 4.1 Fast |
|---|---|---|---|---|---|
| Humanity’s Last Exam | 16.0% | 11.0% | 16.7% | 9.7% | 17.6% |
| GPQA Diamond (science) | 86.9% | 82.8% | 82.3% | 73.0% | 84.3% |
| MMMU-Pro (multimodal) | 76.8% | 66.7% | 74.1% | 58.0% | 63.0% |
| CharXiv Reasoning (charts) | 73.2% | 63.7% | 75.5%* | 61.7% | 31.6% |
| Video-MMMU | 84.8% | 79.2% | 82.5% | — | 74.6% |
| SimpleQA Verified (knowledge) | 43.3% | 28.1% | 9.5% | 5.5% | 19.5% |
| MMMLU (multilingual) | 88.9% | 86.6% | 84.9% | 83.0% | 86.8% |
| LiveCodeBench (code) | 72.0% | 62.6% | 80.4% | 53.2% | 76.5% |
| MRCR v2 128K (long context) | 60.1% | 54.3% | 52.5% | 35.3% | 54.6% |
*GPT-5 mini uses Python for CharXiv. Bold = best in class.
The standout numbers: 86.9% on GPQA Diamond beats every model in this tier, and 43.3% on SimpleQA is 4x better than GPT-5 mini on factual accuracy.
Pricing
One of the cheapest reasoning-capable models available:
| Tier | Input (text/image/video) | Input (audio) | Output | Cached Input |
|---|---|---|---|---|
| Standard | $0.25/M | $0.50/M | $1.50/M | $0.025/M |
| Batch | $0.125/M | $0.25/M | $0.75/M | — |
Via OpenRouter: $0.25/M input, $1.50/M output.
For comparison:
- GPT-5 mini: $0.25/M in, $2.00/M out
- Claude 4.5 Haiku: $1.00/M in, $5.00/M out
- Grok 4.1 Fast: $0.20/M in, $0.50/M out
Flash Lite matches or beats all of these on most benchmarks while staying at the bottom of the price range.
Use Cases
The model excels at high-throughput classification, labeling, and structured extraction. Production users report:
- E-commerce — Populating product categories across hundreds of SKUs in seconds
- Data labeling — 100% consistency in fashion item tagging (Whering)
- Content routing — 94% intent routing accuracy with sub-10-second completions (HubX)
- Multimodal labeling — Large-scale image/video annotation at dramatically reduced cost (Cartwheel)
Model Details
| Property | Value |
|---|---|
| Status | Preview |
| Context window | 1M tokens input, 64K output |
| Modalities | Text, image, video, file, audio → text |
| Tokenizer | Gemini |
| Knowledge cutoff | January 2025 |
| Reasoning | Configurable thinking levels |
| Availability | Google AI Studio, Gemini API, Vertex AI, OpenRouter |
Open-Source Alternatives
Flash Lite is proprietary. For self-hosted alternatives in this performance tier:
- Qwen3.5-35B-A3B — MoE model using only 3B active params, Apache 2.0
- LFM2-24B-A2B (Liquid) — 2B active parameters, runs on consumer hardware
- Mistral Small 3 14B — Apache 2.0, fits on a single GPU
Links
References
- HuggingFace huggingface.co/google/gemini-3.1-flash-lite-preview