
Gemma 3 12B
Multimodal
Zero-eval
#1VQAv2 (val)
#2MMMU (val)
#2WMT24++
+2 more
by Google
About
Gemma 3 12B is a multimodal language model developed by Google. It achieves strong performance with an average score of 63.8% across 26 benchmarks. It excels particularly in GSM8k (94.4%), IFEval (88.9%), DocVQA (87.1%). It supports a 262K token context window for handling large documents. The model is available through 1 API provider. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Google's latest advancement in AI technology.
Pricing Range
Input (per 1M)$0.05 -$0.05
Output (per 1M)$0.10 -$0.10
Providers1
Timeline
AnnouncedMar 12, 2025
ReleasedMar 12, 2025
Specifications
Training Tokens12.0T
Capabilities
Multimodal
License & Family
License
Gemma
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
26 benchmarks
Average Score
63.8%
Best Score
94.4%
High Performers (80%+)
8Performance Metrics
Max Context Window
262.1KAvg Throughput
33.0 tok/sAvg Latency
0msTop Categories
factuality
75.8%
math
73.9%
code
70.5%
vision
70.2%
general
53.2%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
GSM8k
Rank #15 of 46
#12Nova Lite
94.5%
#13Qwen2.5 14B Instruct
94.8%
#14Nova Pro
94.8%
#15Gemma 3 12B
94.4%
#16Qwen3 235B A22B
94.4%
#17Mistral Large 2
93.0%
#18Claude 3 Sonnet
92.3%
IFEval
Rank #10 of 37
#7Llama 3.1 Nemotron Ultra 253B v1
89.5%
#8Nova Lite
89.7%
#9Kimi K2 Instruct
89.8%
#10Gemma 3 12B
88.9%
#11Qwen3-235B-A22B-Instruct-2507
88.7%
#12Llama 3.1 405B Instruct
88.6%
#13GPT-4.5
88.2%
DocVQA
Rank #22 of 26
#19Llama 3.2 11B Instruct
88.4%
#20DeepSeek VL2 Tiny
88.9%
#21Llama 3.2 90B Instruct
90.1%
#22Gemma 3 12B
87.1%
#23Gemma 3 27B
86.6%
#24Grok-1.5V
85.6%
#25Grok-1.5
85.6%
BIG-Bench Hard
Rank #6 of 21
#3Claude 3 Opus
86.8%
#4Gemma 3 27B
87.6%
#5Gemini 1.5 Pro
89.2%
#6Gemma 3 12B
85.7%
#7Gemini 1.5 Flash
85.5%
#8Claude 3 Sonnet
82.9%
#9Phi-3.5-MoE-instruct
79.1%
HumanEval
Rank #31 of 62
#28Nova Lite
85.4%
#29Grok-2 mini
85.7%
#30Qwen2 72B Instruct
86.0%
#31Gemma 3 12B
85.4%
#32Claude 3 Opus
84.9%
#33Qwen2.5 7B Instruct
84.8%
#34Mistral Small 3 24B Instruct
84.8%
All Benchmark Results for Gemma 3 12B
Complete list of benchmark scores with detailed information
GSM8k GSM8k benchmark | math | text | 0.94 | 94.4% | Self-reported |
IFEval IFEval benchmark | code | text | 0.89 | 88.9% | Self-reported |
DocVQA DocVQA benchmark | vision | multimodal | 0.87 | 87.1% | Self-reported |
BIG-Bench Hard BIG-Bench Hard benchmark | general | text | 0.86 | 85.7% | Self-reported |
HumanEval HumanEval benchmark | code | text | 0.85 | 85.4% | Self-reported |
AI2D AI2D benchmark | general | text | 0.84 | 84.2% | Self-reported |
MATH MATH benchmark | math | text | 0.84 | 83.8% | Self-reported |
Natural2Code Natural2Code benchmark | code | text | 0.81 | 80.7% | Self-reported |
FACTS Grounding FACTS Grounding benchmark | factuality | text | 0.76 | 75.8% | Self-reported |
ChartQA ChartQA benchmark | general | multimodal | 0.76 | 75.7% | Self-reported |
Showing 1 to 10 of 26 benchmarks
Resources