Multipl-E HumanEval

code

text

About

Multipl-E HumanEval benchmark

Evaluation Stats

Total Models3

Organizations1

Verified Results0

Self-Reported3

Benchmark Details

Max Score1

Language

en

Performance Overview

Score distribution and top performers

Score Distribution

3 models

Top Score

75.2%

Average Score

63.8%

High Performers (80%+)

0

Top Organizations

#1Meta

3 models

63.8%

Leaderboard

Top 3 models ranked by performance

1

Llama 3.1 405B Instruct

75.2%

Raw: 0.752

Self-reported

2

Llama 3.1 70B Instruct

65.5%

Raw: 0.655

Self-reported

3

Llama 3.1 8B Instruct

50.8%

Raw: 0.508

Self-reported