Google

Gemma 2 27B

Zero-eval
#1ARC-E
#1BoolQ
#1Natural Questions
+4 more

by Google

About

Gemma 2 27B is a language model developed by Google. It achieves strong performance with an average score of 69.1% across 16 benchmarks. It excels particularly in ARC-E (88.6%), HellaSwag (86.4%), BoolQ (84.8%). The model shows particular specialization in reasoning tasks with an average performance of 82.5%. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2024, it represents Google's latest advancement in AI technology.

Timeline
AnnouncedJun 27, 2024
ReleasedJun 27, 2024
Specifications
Training Tokens13.0T
License & Family
License
Gemma
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

16 benchmarks
Average Score
69.1%
Best Score
88.6%
High Performers (80%+)
6

Top Categories

reasoning
82.5%
general
70.0%
math
58.1%
code
56.5%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

ARC-E

Rank #1 of 6
#1Gemma 2 27B
88.6%
#2Gemma 2 9B
88.0%
#3Gemma 3n E4B
81.6%
#4Gemma 3n E4B Instructed LiteRT Preview
81.6%

HellaSwag

Rank #8 of 24
#5Gemini 1.5 Flash
86.5%
#6Qwen2 72B Instruct
87.6%
#7Command R+
88.6%
#8Gemma 2 27B
86.4%
#9Claude 3 Haiku
85.9%
#10Llama 3.1 Nemotron 70B Instruct
85.6%
#11Qwen2.5 32B Instruct
85.2%

BoolQ

Rank #1 of 9
#1Gemma 2 27B
84.8%
#2Phi-3.5-MoE-instruct
84.6%
#3Gemma 2 9B
84.2%
#4Gemma 3n E4B
81.6%

TriviaQA

Rank #2 of 13
#1Kimi K2 Base
85.1%
#2Gemma 2 27B
83.7%
#3Mistral Small 3.1 24B Base
80.5%
#4Mistral Small 3.1 24B Instruct
80.5%
#5Mistral Small 3 24B Base
80.3%

Winogrande

Rank #5 of 19
#2Llama 3.1 Nemotron 70B Instruct
84.5%
#3Qwen2 72B Instruct
85.1%
#4Command R+
85.4%
#5Gemma 2 27B
83.7%
#6Qwen2.5 32B Instruct
82.0%
#7Phi-3.5-MoE-instruct
81.3%
#8Qwen2.5-Coder 32B Instruct
80.8%
All Benchmark Results for Gemma 2 27B
Complete list of benchmark scores with detailed information
ARC-E
ARC-E benchmark
reasoning
text
0.89
88.6%
Self-reported
HellaSwag
HellaSwag benchmark
reasoning
text
0.86
86.4%
Self-reported
BoolQ
BoolQ benchmark
general
text
0.85
84.8%
Self-reported
TriviaQA
TriviaQA benchmark
general
text
0.84
83.7%
Self-reported
Winogrande
Winogrande benchmark
reasoning
text
0.84
83.7%
Self-reported
PIQA
PIQA benchmark
general
text
0.83
83.2%
Self-reported
MMLU
MMLU benchmark
general
text
0.75
75.2%
Self-reported
BIG-Bench
BIG-Bench benchmark
general
text
0.75
74.9%
Self-reported
GSM8k
GSM8k benchmark
math
text
0.74
74.0%
Self-reported
ARC-C
ARC-C benchmark
reasoning
text
0.71
71.4%
Self-reported
Showing 1 to 10 of 16 benchmarks