Gemma 3 1B

Name: Gemma 3 1B
Rating: 29.9 (18 reviews)
Author: Google

Zero-eval

by Google

About

Gemma 3 1B is a language model developed by Google. The model shows competitive results across 18 benchmarks. It excels particularly in IFEval (80.2%), GSM8k (62.8%), Natural2Code (56.0%). It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Google's latest advancement in AI technology.

Timeline

AnnouncedMar 12, 2025

ReleasedMar 12, 2025

Specifications

Training Tokens2.0T

License & Family

License

Gemma

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

18 benchmarks

Average Score

29.9%

Best Score

80.2%

High Performers (80%+)

Top Categories

code

43.0%

math

42.2%

factuality

36.4%

general

17.8%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

IFEval

Rank #28 of 37

#25Llama 3.1 8B Instruct

80.4%

#26GPT-4o

81.0%

#27Mistral Small 3 24B Instruct

82.9%

#28Gemma 3 1B

80.2%

#29Llama 3.1 Nemotron Nano 8B V1

79.3%

#30Llama 3.2 3B Instruct

77.4%

#31Granite 3.3 8B Instruct

74.8%

GSM8k

Rank #45 of 46

#42Gemma 2 9B

68.6%

#43IBM Granite 4.0 Tiny Preview

70.1%

#44Command R+

70.7%

#45Gemma 3 1B

62.8%

#46Granite 3.3 8B Base

59.0%

Natural2Code

Rank #8 of 8

#5Gemma 3 4B

70.3%

#6Gemini 1.5 Flash 8B

75.5%

#7Gemini 1.5 Flash

79.8%

#8Gemma 3 1B

56.0%

MATH

Rank #54 of 63

#51Llama 3.2 3B Instruct

48.0%

#52Pixtral-12B

48.1%

#53Phi-3.5-mini-instruct

48.5%

#54Gemma 3 1B

48.0%

#55Qwen2.5-Coder 7B Instruct

46.6%

#56Mistral Small 3 24B Base

46.0%

#57GPT-3.5 Turbo

43.1%

HumanEval

Rank #60 of 62

#57Gemma 2 27B

51.8%

#58Phi-3.5-mini-instruct

62.8%

#59Gemma 3n E2B Instructed

66.5%

#60Gemma 3 1B

41.5%

#61Gemma 2 9B

40.2%

#62Ministral 8B Instruct

34.8%

All Benchmark Results for Gemma 3 1B

Complete list of benchmark scores with detailed information


IFEval IFEval benchmark	code	text	0.80	80.2%	Self-reported
GSM8k GSM8k benchmark	math	text	0.63	62.8%	Self-reported
Natural2Code Natural2Code benchmark	code	text	0.56	56.0%	Self-reported
MATH MATH benchmark	math	text	0.48	48.0%	Self-reported
HumanEval HumanEval benchmark	code	text	0.41	41.5%	Self-reported
BIG-Bench Hard BIG-Bench Hard benchmark	general	text	0.39	39.1%	Self-reported
FACTS Grounding FACTS Grounding benchmark	factuality	text	0.36	36.4%	Self-reported
WMT24++ WMT24++ benchmark	general	text	0.36	35.9%	Self-reported
MBPP MBPP benchmark	code	text	35.20	35.2%	Self-reported
Global-MMLU-Lite Global-MMLU-Lite benchmark	general	text	0.34	34.2%	Self-reported

Showing 1 to 10 of 18 benchmarks

Resources

API Reference Playground Research Paper Blog Post Model Weights