Cohere

Command R+

Zero-eval
#2Winogrande

by Cohere

About

Command R+ is a language model developed by Cohere. It achieves strong performance with an average score of 74.6% across 6 benchmarks. It excels particularly in HellaSwag (88.6%), Winogrande (85.4%), MMLU (75.7%). The model shows particular specialization in reasoning tasks with an average performance of 81.7%. It supports a 256K token context window for handling large documents. The model is available through 2 API providers. Released in 2024, it represents Cohere's latest advancement in AI technology.

Pricing Range
Input (per 1M)$0.25 -$3.00
Output (per 1M)$1.00 -$15.00
Providers2
Timeline
AnnouncedAug 30, 2024
ReleasedAug 30, 2024
Specifications
License & Family
License
CC BY-NC
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

6 benchmarks
Average Score
74.6%
Best Score
88.6%
High Performers (80%+)
2

Performance Metrics

Max Context Window
256.0K
Avg Throughput
79.5 tok/s
Avg Latency
1ms

Top Categories

reasoning
81.7%
general
75.7%
math
70.7%
factuality
56.3%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

HellaSwag

Rank #5 of 24
#2Claude 3 Sonnet
89.0%
#3Gemini 1.5 Pro
93.3%
#4GPT-4
95.3%
#5Command R+
88.6%
#6Qwen2 72B Instruct
87.6%
#7Gemini 1.5 Flash
86.5%
#8Gemma 2 27B
86.4%

Winogrande

Rank #2 of 19
#1GPT-4
87.5%
#2Command R+
85.4%
#3Qwen2 72B Instruct
85.1%
#4Llama 3.1 Nemotron 70B Instruct
84.5%
#5Gemma 2 27B
83.7%

MMLU

Rank #53 of 78
#50Nova Micro
77.6%
#51Qwen2.5 VL 32B Instruct
78.4%
#52Phi-3.5-MoE-instruct
78.9%
#53Command R+
75.7%
#54Gemma 2 27B
75.2%
#55Claude 3 Haiku
75.2%
#56Qwen2.5-Coder 32B Instruct
75.1%

ARC-C

Rank #19 of 31
#16Gemma 2 27B
71.4%
#17Ministral 8B Instruct
71.9%
#18Llama 3.2 3B Instruct
78.6%
#19Command R+
71.0%
#20Qwen2.5-Coder 32B Instruct
70.5%
#21Qwen2.5 32B Instruct
70.4%
#22Llama 3.1 Nemotron 70B Instruct
69.2%

GSM8k

Rank #42 of 46
#39Gemma 2 27B
74.0%
#40Jamba 1.5 Mini
75.8%
#41Llama 3.2 3B Instruct
77.7%
#42Command R+
70.7%
#43IBM Granite 4.0 Tiny Preview
70.1%
#44Gemma 2 9B
68.6%
#45Gemma 3 1B
62.8%
All Benchmark Results for Command R+
Complete list of benchmark scores with detailed information
HellaSwag
HellaSwag benchmark
reasoning
text
0.89
88.6%
Self-reported
Winogrande
Winogrande benchmark
reasoning
text
0.85
85.4%
Self-reported
MMLU
MMLU benchmark
general
text
0.76
75.7%
Self-reported
ARC-C
ARC-C benchmark
reasoning
text
0.71
71.0%
Self-reported
GSM8k
GSM8k benchmark
math
text
0.71
70.7%
Self-reported
TruthfulQA
TruthfulQA benchmark
factuality
text
0.56
56.3%
Self-reported