
Gemma 2 27B
Zero-eval
#1ARC-E
#1BoolQ
#1Natural Questions
+4 more
by Google
About
Gemma 2 27B is a language model developed by Google. It achieves strong performance with an average score of 69.1% across 16 benchmarks. It excels particularly in ARC-E (88.6%), HellaSwag (86.4%), BoolQ (84.8%). The model shows particular specialization in reasoning tasks with an average performance of 82.5%. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2024, it represents Google's latest advancement in AI technology.
Timeline
AnnouncedJun 27, 2024
ReleasedJun 27, 2024
Specifications
Training Tokens13.0T
License & Family
License
Gemma
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
16 benchmarks
Average Score
69.1%
Best Score
88.6%
High Performers (80%+)
6Top Categories
reasoning
82.5%
general
70.0%
math
58.1%
code
56.5%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
ARC-E
Rank #1 of 6
#1Gemma 2 27B
88.6%
#2Gemma 2 9B
88.0%
#3Gemma 3n E4B
81.6%
#4Gemma 3n E4B Instructed LiteRT Preview
81.6%
HellaSwag
Rank #8 of 24
#5Gemini 1.5 Flash
86.5%
#6Qwen2 72B Instruct
87.6%
#7Command R+
88.6%
#8Gemma 2 27B
86.4%
#9Claude 3 Haiku
85.9%
#10Llama 3.1 Nemotron 70B Instruct
85.6%
#11Qwen2.5 32B Instruct
85.2%
BoolQ
Rank #1 of 9
#1Gemma 2 27B
84.8%
#2Phi-3.5-MoE-instruct
84.6%
#3Gemma 2 9B
84.2%
#4Gemma 3n E4B
81.6%
TriviaQA
Rank #2 of 13
#1Kimi K2 Base
85.1%
#2Gemma 2 27B
83.7%
#3Mistral Small 3.1 24B Base
80.5%
#4Mistral Small 3.1 24B Instruct
80.5%
#5Mistral Small 3 24B Base
80.3%
Winogrande
Rank #5 of 19
#2Llama 3.1 Nemotron 70B Instruct
84.5%
#3Qwen2 72B Instruct
85.1%
#4Command R+
85.4%
#5Gemma 2 27B
83.7%
#6Qwen2.5 32B Instruct
82.0%
#7Phi-3.5-MoE-instruct
81.3%
#8Qwen2.5-Coder 32B Instruct
80.8%
All Benchmark Results for Gemma 2 27B
Complete list of benchmark scores with detailed information
ARC-E ARC-E benchmark | reasoning | text | 0.89 | 88.6% | Self-reported |
HellaSwag HellaSwag benchmark | reasoning | text | 0.86 | 86.4% | Self-reported |
BoolQ BoolQ benchmark | general | text | 0.85 | 84.8% | Self-reported |
TriviaQA TriviaQA benchmark | general | text | 0.84 | 83.7% | Self-reported |
Winogrande Winogrande benchmark | reasoning | text | 0.84 | 83.7% | Self-reported |
PIQA PIQA benchmark | general | text | 0.83 | 83.2% | Self-reported |
MMLU MMLU benchmark | general | text | 0.75 | 75.2% | Self-reported |
BIG-Bench BIG-Bench benchmark | general | text | 0.75 | 74.9% | Self-reported |
GSM8k GSM8k benchmark | math | text | 0.74 | 74.0% | Self-reported |
ARC-C ARC-C benchmark | reasoning | text | 0.71 | 71.4% | Self-reported |
Showing 1 to 10 of 16 benchmarks