Gemini 2.0 Flash Thinking
Multimodal
Zero-eval
by Google
About
Gemini 2.0 Flash Thinking is a multimodal language model developed by Google. It achieves strong performance with an average score of 74.3% across 3 benchmarks. Notable strengths include MMMU (75.4%), GPQA (74.2%), AIME 2024 (73.3%). As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2025, it represents Google's latest advancement in AI technology.
Timeline
AnnouncedJan 21, 2025
ReleasedJan 21, 2025
Knowledge CutoffAug 1, 2024
Specifications
Capabilities
Multimodal
License & Family
License
Proprietary
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
3 benchmarks
Average Score
74.3%
Best Score
75.4%
High Performers (80%+)
0Top Categories
vision
75.4%
general
73.8%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
MMMU
Rank #9 of 52
#6o1
77.6%
#7Grok-3
78.0%
#8Gemini 2.5 Pro
79.6%
#9Gemini 2.0 Flash Thinking
75.4%
#10GPT-4.5
75.2%
#11Claude 3.7 Sonnet
75.0%
#12GPT-4.1
74.8%
GPQA
Rank #22 of 115
#19Kimi K2 Instruct
75.1%
#20Claude Sonnet 4
75.4%
#21Llama 3.1 Nemotron Ultra 253B v1
76.0%
#22Gemini 2.0 Flash Thinking
74.2%
#23DeepSeek R1 Zero
73.3%
#24o1-preview
73.3%
#25GPT OSS 120B
71.5%
AIME 2024
Rank #29 of 41
#26Magistral Medium
73.6%
#27o1
74.3%
#28Phi 4 Reasoning
75.3%
#29Gemini 2.0 Flash Thinking
73.3%
#30Magistral Small 2506
70.7%
#31Kimi K2 Instruct
69.6%
#32DeepSeek-V3 0324
59.4%
All Benchmark Results for Gemini 2.0 Flash Thinking
Complete list of benchmark scores with detailed information
MMMU MMMU benchmark | vision | multimodal | 0.75 | 75.4% | Self-reported |
GPQA GPQA benchmark | general | text | 0.74 | 74.2% | Self-reported |
AIME 2024 AIME 2024 benchmark | general | text | 0.73 | 73.3% | Self-reported |
Resources