Google

Gemma 3 4B

Multimodal
Zero-eval
#3VQAv2 (val)
#3MMMU (val)

by Google

About

Gemma 3 4B is a multimodal language model developed by Google. The model shows competitive results across 26 benchmarks. It excels particularly in IFEval (90.2%), GSM8k (89.2%), DocVQA (75.8%). It supports a 262K token context window for handling large documents. The model is available through 1 API provider. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Google's latest advancement in AI technology.

Pricing Range
Input (per 1M)$0.02 -$0.02
Output (per 1M)$0.04 -$0.04
Providers1
Timeline
AnnouncedMar 12, 2025
ReleasedMar 12, 2025
Knowledge CutoffAug 1, 2024
Specifications
Training Tokens4.0T
Capabilities
Multimodal
License & Family
License
Gemma
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

26 benchmarks
Average Score
53.0%
Best Score
90.2%
High Performers (80%+)
2

Performance Metrics

Max Context Window
262.1K
Avg Throughput
33.0 tok/s
Avg Latency
0ms

Top Categories

factuality
70.1%
math
64.5%
code
61.5%
vision
59.0%
general
40.7%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

IFEval

Rank #6 of 37
#3Gemma 3 27B
90.4%
#4Llama 3.3 70B Instruct
92.1%
#5Nova Pro
92.1%
#6Gemma 3 4B
90.2%
#7Kimi K2 Instruct
89.8%
#8Nova Lite
89.7%
#9Llama 3.1 Nemotron Ultra 253B v1
89.5%

GSM8k

Rank #27 of 46
#24Grok-1.5
90.0%
#25Gemini 1.5 Pro
90.8%
#26Qwen2.5-Coder 32B Instruct
91.1%
#27Gemma 3 4B
89.2%
#28Claude 3 Haiku
88.9%
#29Qwen2.5-Omni-7B
88.7%
#30Phi-3.5-MoE-instruct
88.7%

DocVQA

Rank #26 of 26
#23Grok-1.5
85.6%
#24Grok-1.5V
85.6%
#25Gemma 3 27B
86.6%
#26Gemma 3 4B
75.8%

MATH

Rank #21 of 63
#18Grok-2
76.1%
#19GPT-4o
76.6%
#20Nova Pro
76.6%
#21Gemma 3 4B
75.6%
#22Qwen2.5 7B Instruct
75.5%
#23DeepSeek-V2.5
74.7%
#24Llama 3.1 405B Instruct
73.8%

AI2D

Rank #16 of 17
#13Phi-3.5-vision-instruct
78.1%
#14DeepSeek VL2 Small
80.0%
#15DeepSeek VL2
81.4%
#16Gemma 3 4B
74.8%
#17DeepSeek VL2 Tiny
71.6%
All Benchmark Results for Gemma 3 4B
Complete list of benchmark scores with detailed information
IFEval
IFEval benchmark
code
text
0.90
90.2%
Self-reported
GSM8k
GSM8k benchmark
math
text
0.89
89.2%
Self-reported
DocVQA
DocVQA benchmark
vision
multimodal
0.76
75.8%
Self-reported
MATH
MATH benchmark
math
text
0.76
75.6%
Self-reported
AI2D
AI2D benchmark
general
text
0.75
74.8%
Self-reported
BIG-Bench Hard
BIG-Bench Hard benchmark
general
text
0.72
72.2%
Self-reported
HumanEval
HumanEval benchmark
code
text
0.71
71.3%
Self-reported
Natural2Code
Natural2Code benchmark
code
text
0.70
70.3%
Self-reported
FACTS Grounding
FACTS Grounding benchmark
factuality
text
0.70
70.1%
Self-reported
ChartQA
ChartQA benchmark
general
multimodal
0.69
68.8%
Self-reported
Showing 1 to 10 of 26 benchmarks