Google

Gemma 3 12B

Multimodal
Zero-eval
#1VQAv2 (val)
#2MMMU (val)
#2WMT24++
+2 more

by Google

About

Gemma 3 12B is a multimodal language model developed by Google. It achieves strong performance with an average score of 63.8% across 26 benchmarks. It excels particularly in GSM8k (94.4%), IFEval (88.9%), DocVQA (87.1%). It supports a 262K token context window for handling large documents. The model is available through 1 API provider. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Google's latest advancement in AI technology.

Pricing Range
Input (per 1M)$0.05 -$0.05
Output (per 1M)$0.10 -$0.10
Providers1
Timeline
AnnouncedMar 12, 2025
ReleasedMar 12, 2025
Specifications
Training Tokens12.0T
Capabilities
Multimodal
License & Family
License
Gemma
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

26 benchmarks
Average Score
63.8%
Best Score
94.4%
High Performers (80%+)
8

Performance Metrics

Max Context Window
262.1K
Avg Throughput
33.0 tok/s
Avg Latency
0ms

Top Categories

factuality
75.8%
math
73.9%
code
70.5%
vision
70.2%
general
53.2%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

GSM8k

Rank #15 of 46
#12Nova Lite
94.5%
#13Qwen2.5 14B Instruct
94.8%
#14Nova Pro
94.8%
#15Gemma 3 12B
94.4%
#16Qwen3 235B A22B
94.4%
#17Mistral Large 2
93.0%
#18Claude 3 Sonnet
92.3%

IFEval

Rank #10 of 37
#7Llama 3.1 Nemotron Ultra 253B v1
89.5%
#8Nova Lite
89.7%
#9Kimi K2 Instruct
89.8%
#10Gemma 3 12B
88.9%
#11Qwen3-235B-A22B-Instruct-2507
88.7%
#12Llama 3.1 405B Instruct
88.6%
#13GPT-4.5
88.2%

DocVQA

Rank #22 of 26
#19Llama 3.2 11B Instruct
88.4%
#20DeepSeek VL2 Tiny
88.9%
#21Llama 3.2 90B Instruct
90.1%
#22Gemma 3 12B
87.1%
#23Gemma 3 27B
86.6%
#24Grok-1.5V
85.6%
#25Grok-1.5
85.6%

BIG-Bench Hard

Rank #6 of 21
#3Claude 3 Opus
86.8%
#4Gemma 3 27B
87.6%
#5Gemini 1.5 Pro
89.2%
#6Gemma 3 12B
85.7%
#7Gemini 1.5 Flash
85.5%
#8Claude 3 Sonnet
82.9%
#9Phi-3.5-MoE-instruct
79.1%

HumanEval

Rank #31 of 62
#28Nova Lite
85.4%
#29Grok-2 mini
85.7%
#30Qwen2 72B Instruct
86.0%
#31Gemma 3 12B
85.4%
#32Claude 3 Opus
84.9%
#33Qwen2.5 7B Instruct
84.8%
#34Mistral Small 3 24B Instruct
84.8%
All Benchmark Results for Gemma 3 12B
Complete list of benchmark scores with detailed information
GSM8k
GSM8k benchmark
math
text
0.94
94.4%
Self-reported
IFEval
IFEval benchmark
code
text
0.89
88.9%
Self-reported
DocVQA
DocVQA benchmark
vision
multimodal
0.87
87.1%
Self-reported
BIG-Bench Hard
BIG-Bench Hard benchmark
general
text
0.86
85.7%
Self-reported
HumanEval
HumanEval benchmark
code
text
0.85
85.4%
Self-reported
AI2D
AI2D benchmark
general
text
0.84
84.2%
Self-reported
MATH
MATH benchmark
math
text
0.84
83.8%
Self-reported
Natural2Code
Natural2Code benchmark
code
text
0.81
80.7%
Self-reported
FACTS Grounding
FACTS Grounding benchmark
factuality
text
0.76
75.8%
Self-reported
ChartQA
ChartQA benchmark
general
multimodal
0.76
75.7%
Self-reported
Showing 1 to 10 of 26 benchmarks