ARC-C

reasoning

text

About

ARC-C benchmark

Evaluation Stats

Total Models31

Organizations11

Verified Results0

Self-Reported31

Benchmark Details

Max Score1

Language

en

Performance Overview

Score distribution and top performers

Score Distribution

31 models

Top Score

96.9%

Average Score

77.6%

High Performers (80%+)

15

Top Organizations

#1Anthropic

3 models

92.9%

#2Amazon

3 models

92.5%

#3AI21 Labs

2 models

89.3%

#4Meta

4 models

88.4%

#5Microsoft

3 models

86.4%

Leaderboard

Top 20 models ranked by performance

1

Llama 3.1 405B Instruct

96.9%

Raw: 0.969

Self-reported

2

96.4%

Raw: 0.964

Self-reported

3

94.8%

Raw: 0.948

Self-reported

4

Llama 3.1 70B Instruct

94.8%

Raw: 0.948

Self-reported

5

Claude 3 Sonnet

93.2%

Raw: 0.932

Self-reported

6

Jamba 1.5 Large

93.0%

Raw: 0.93

Self-reported

7

92.4%

Raw: 0.924

Self-reported

8

Mistral Small 3 24B Base

91.3%

Raw: 0.9129

Self-reported

9

Phi-3.5-MoE-instruct

91.0%

Raw: 0.91

Self-reported

10

90.2%

Raw: 0.902

Self-reported

11

89.2%

Raw: 0.892

Self-reported

12

85.7%

Raw: 0.857

Self-reported

13

Phi-3.5-mini-instruct

84.6%

Raw: 0.846

Self-reported

14

83.7%

Raw: 0.837

Self-reported

15

Llama 3.1 8B Instruct

83.4%

Raw: 0.834

Self-reported

16

Llama 3.2 3B Instruct

78.6%

Raw: 0.786

Self-reported

17

Ministral 8B Instruct

71.9%

Raw: 0.719

Self-reported

18

71.4%

Raw: 0.714

Self-reported

19

71.0%

Raw: 0.7099

Self-reported

20

Qwen2.5-Coder 32B Instruct

70.5%

Raw: 0.705

Self-reported

Showing top 20 of 31 models