Multi-IF

general

text

About

Multi-IF benchmark

Evaluation Stats

Total Models8

Organizations2

Verified Results0

Self-Reported8

Benchmark Details

Max Score1

Language

en

Performance Overview

Score distribution and top performers

Score Distribution

8 models

Top Score

79.5%

Average Score

69.5%

High Performers (80%+)

0

Top Organizations

#1Alibaba

2 models

74.8%

#2OpenAI

6 models

67.7%

Leaderboard

Top 8 models ranked by performance

1

79.5%

Raw: 0.795

Self-reported

2

Qwen3-235B-A22B-Instruct-2507

77.5%

Raw: 0.775

Self-reported

3

72.2%

Raw: 0.722

Self-reported

4

70.8%

Raw: 0.708

Self-reported

5

70.8%

Raw: 0.708

Self-reported

6

67.0%

Raw: 0.67

Self-reported

7

60.9%

Raw: 0.609

Self-reported

8

57.2%

Raw: 0.572

Self-reported