
Gemma 3 4B
Multimodal
Zero-eval
#3VQAv2 (val)
#3MMMU (val)
by Google
About
Gemma 3 4B is a multimodal language model developed by Google. The model shows competitive results across 26 benchmarks. It excels particularly in IFEval (90.2%), GSM8k (89.2%), DocVQA (75.8%). It supports a 262K token context window for handling large documents. The model is available through 1 API provider. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Google's latest advancement in AI technology.
Pricing Range
Input (per 1M)$0.02 -$0.02
Output (per 1M)$0.04 -$0.04
Providers1
Timeline
AnnouncedMar 12, 2025
ReleasedMar 12, 2025
Knowledge CutoffAug 1, 2024
Specifications
Training Tokens4.0T
Capabilities
Multimodal
License & Family
License
Gemma
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
26 benchmarks
Average Score
53.0%
Best Score
90.2%
High Performers (80%+)
2Performance Metrics
Max Context Window
262.1KAvg Throughput
33.0 tok/sAvg Latency
0msTop Categories
factuality
70.1%
math
64.5%
code
61.5%
vision
59.0%
general
40.7%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
IFEval
Rank #6 of 37
#3Gemma 3 27B
90.4%
#4Llama 3.3 70B Instruct
92.1%
#5Nova Pro
92.1%
#6Gemma 3 4B
90.2%
#7Kimi K2 Instruct
89.8%
#8Nova Lite
89.7%
#9Llama 3.1 Nemotron Ultra 253B v1
89.5%
GSM8k
Rank #27 of 46
#24Grok-1.5
90.0%
#25Gemini 1.5 Pro
90.8%
#26Qwen2.5-Coder 32B Instruct
91.1%
#27Gemma 3 4B
89.2%
#28Claude 3 Haiku
88.9%
#29Qwen2.5-Omni-7B
88.7%
#30Phi-3.5-MoE-instruct
88.7%
DocVQA
Rank #26 of 26
#23Grok-1.5
85.6%
#24Grok-1.5V
85.6%
#25Gemma 3 27B
86.6%
#26Gemma 3 4B
75.8%
MATH
Rank #21 of 63
#18Grok-2
76.1%
#19GPT-4o
76.6%
#20Nova Pro
76.6%
#21Gemma 3 4B
75.6%
#22Qwen2.5 7B Instruct
75.5%
#23DeepSeek-V2.5
74.7%
#24Llama 3.1 405B Instruct
73.8%
AI2D
Rank #16 of 17
#13Phi-3.5-vision-instruct
78.1%
#14DeepSeek VL2 Small
80.0%
#15DeepSeek VL2
81.4%
#16Gemma 3 4B
74.8%
#17DeepSeek VL2 Tiny
71.6%
All Benchmark Results for Gemma 3 4B
Complete list of benchmark scores with detailed information
IFEval IFEval benchmark | code | text | 0.90 | 90.2% | Self-reported |
GSM8k GSM8k benchmark | math | text | 0.89 | 89.2% | Self-reported |
DocVQA DocVQA benchmark | vision | multimodal | 0.76 | 75.8% | Self-reported |
MATH MATH benchmark | math | text | 0.76 | 75.6% | Self-reported |
AI2D AI2D benchmark | general | text | 0.75 | 74.8% | Self-reported |
BIG-Bench Hard BIG-Bench Hard benchmark | general | text | 0.72 | 72.2% | Self-reported |
HumanEval HumanEval benchmark | code | text | 0.71 | 71.3% | Self-reported |
Natural2Code Natural2Code benchmark | code | text | 0.70 | 70.3% | Self-reported |
FACTS Grounding FACTS Grounding benchmark | factuality | text | 0.70 | 70.1% | Self-reported |
ChartQA ChartQA benchmark | general | multimodal | 0.69 | 68.8% | Self-reported |
Showing 1 to 10 of 26 benchmarks
Resources