
Gemma 3 1B
Zero-eval
by Google
About
Gemma 3 1B is a language model developed by Google. The model shows competitive results across 18 benchmarks. It excels particularly in IFEval (80.2%), GSM8k (62.8%), Natural2Code (56.0%). It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Google's latest advancement in AI technology.
Timeline
AnnouncedMar 12, 2025
ReleasedMar 12, 2025
Specifications
Training Tokens2.0T
License & Family
License
Gemma
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
18 benchmarks
Average Score
29.9%
Best Score
80.2%
High Performers (80%+)
1Top Categories
code
43.0%
math
42.2%
factuality
36.4%
general
17.8%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
IFEval
Rank #28 of 37
#25Llama 3.1 8B Instruct
80.4%
#26GPT-4o
81.0%
#27Mistral Small 3 24B Instruct
82.9%
#28Gemma 3 1B
80.2%
#29Llama 3.1 Nemotron Nano 8B V1
79.3%
#30Llama 3.2 3B Instruct
77.4%
#31Granite 3.3 8B Instruct
74.8%
GSM8k
Rank #45 of 46
#42Gemma 2 9B
68.6%
#43IBM Granite 4.0 Tiny Preview
70.1%
#44Command R+
70.7%
#45Gemma 3 1B
62.8%
#46Granite 3.3 8B Base
59.0%
Natural2Code
Rank #8 of 8
#5Gemma 3 4B
70.3%
#6Gemini 1.5 Flash 8B
75.5%
#7Gemini 1.5 Flash
79.8%
#8Gemma 3 1B
56.0%
MATH
Rank #54 of 63
#51Llama 3.2 3B Instruct
48.0%
#52Pixtral-12B
48.1%
#53Phi-3.5-mini-instruct
48.5%
#54Gemma 3 1B
48.0%
#55Qwen2.5-Coder 7B Instruct
46.6%
#56Mistral Small 3 24B Base
46.0%
#57GPT-3.5 Turbo
43.1%
HumanEval
Rank #60 of 62
#57Gemma 2 27B
51.8%
#58Phi-3.5-mini-instruct
62.8%
#59Gemma 3n E2B Instructed
66.5%
#60Gemma 3 1B
41.5%
#61Gemma 2 9B
40.2%
#62Ministral 8B Instruct
34.8%
All Benchmark Results for Gemma 3 1B
Complete list of benchmark scores with detailed information
IFEval IFEval benchmark | code | text | 0.80 | 80.2% | Self-reported |
GSM8k GSM8k benchmark | math | text | 0.63 | 62.8% | Self-reported |
Natural2Code Natural2Code benchmark | code | text | 0.56 | 56.0% | Self-reported |
MATH MATH benchmark | math | text | 0.48 | 48.0% | Self-reported |
HumanEval HumanEval benchmark | code | text | 0.41 | 41.5% | Self-reported |
BIG-Bench Hard BIG-Bench Hard benchmark | general | text | 0.39 | 39.1% | Self-reported |
FACTS Grounding FACTS Grounding benchmark | factuality | text | 0.36 | 36.4% | Self-reported |
WMT24++ WMT24++ benchmark | general | text | 0.36 | 35.9% | Self-reported |
MBPP MBPP benchmark | code | text | 35.20 | 35.2% | Self-reported |
Global-MMLU-Lite Global-MMLU-Lite benchmark | general | text | 0.34 | 34.2% | Self-reported |
Showing 1 to 10 of 18 benchmarks