Google

Gemma 3 1B

Zero-eval

by Google

About

Gemma 3 1B is a language model developed by Google. The model shows competitive results across 18 benchmarks. It excels particularly in IFEval (80.2%), GSM8k (62.8%), Natural2Code (56.0%). It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Google's latest advancement in AI technology.

Timeline
AnnouncedMar 12, 2025
ReleasedMar 12, 2025
Specifications
Training Tokens2.0T
License & Family
License
Gemma
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

18 benchmarks
Average Score
29.9%
Best Score
80.2%
High Performers (80%+)
1

Top Categories

code
43.0%
math
42.2%
factuality
36.4%
general
17.8%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

IFEval

Rank #28 of 37
#25Llama 3.1 8B Instruct
80.4%
#26GPT-4o
81.0%
#27Mistral Small 3 24B Instruct
82.9%
#28Gemma 3 1B
80.2%
#29Llama 3.1 Nemotron Nano 8B V1
79.3%
#30Llama 3.2 3B Instruct
77.4%
#31Granite 3.3 8B Instruct
74.8%

GSM8k

Rank #45 of 46
#42Gemma 2 9B
68.6%
#43IBM Granite 4.0 Tiny Preview
70.1%
#44Command R+
70.7%
#45Gemma 3 1B
62.8%
#46Granite 3.3 8B Base
59.0%

Natural2Code

Rank #8 of 8
#5Gemma 3 4B
70.3%
#6Gemini 1.5 Flash 8B
75.5%
#7Gemini 1.5 Flash
79.8%
#8Gemma 3 1B
56.0%

MATH

Rank #54 of 63
#51Llama 3.2 3B Instruct
48.0%
#52Pixtral-12B
48.1%
#53Phi-3.5-mini-instruct
48.5%
#54Gemma 3 1B
48.0%
#55Qwen2.5-Coder 7B Instruct
46.6%
#56Mistral Small 3 24B Base
46.0%
#57GPT-3.5 Turbo
43.1%

HumanEval

Rank #60 of 62
#57Gemma 2 27B
51.8%
#58Phi-3.5-mini-instruct
62.8%
#59Gemma 3n E2B Instructed
66.5%
#60Gemma 3 1B
41.5%
#61Gemma 2 9B
40.2%
#62Ministral 8B Instruct
34.8%
All Benchmark Results for Gemma 3 1B
Complete list of benchmark scores with detailed information
IFEval
IFEval benchmark
code
text
0.80
80.2%
Self-reported
GSM8k
GSM8k benchmark
math
text
0.63
62.8%
Self-reported
Natural2Code
Natural2Code benchmark
code
text
0.56
56.0%
Self-reported
MATH
MATH benchmark
math
text
0.48
48.0%
Self-reported
HumanEval
HumanEval benchmark
code
text
0.41
41.5%
Self-reported
BIG-Bench Hard
BIG-Bench Hard benchmark
general
text
0.39
39.1%
Self-reported
FACTS Grounding
FACTS Grounding benchmark
factuality
text
0.36
36.4%
Self-reported
WMT24++
WMT24++ benchmark
general
text
0.36
35.9%
Self-reported
MBPP
MBPP benchmark
code
text
35.20
35.2%
Self-reported
Global-MMLU-Lite
Global-MMLU-Lite benchmark
general
text
0.34
34.2%
Self-reported
Showing 1 to 10 of 18 benchmarks