Multi-IF

general
text
About

Multi-IF benchmark

Evaluation Stats
Total Models8
Organizations2
Verified Results0
Self-Reported8
Benchmark Details
Max Score1
Language
en
Performance Overview
Score distribution and top performers

Score Distribution

8 models
Top Score
79.5%
Average Score
69.5%
High Performers (80%+)
0

Top Organizations

#1Alibaba
2 models
74.8%
#2OpenAI
6 models
67.7%
Leaderboard
Top 8 models ranked by performance
79.5%
Raw: 0.795
Self-reported
77.5%
Raw: 0.775
Self-reported
72.2%
Raw: 0.722
Self-reported
70.8%
Raw: 0.708
Self-reported
70.8%
Raw: 0.708
Self-reported
67.0%
Raw: 0.67
Self-reported
60.9%
Raw: 0.609
Self-reported
57.2%
Raw: 0.572
Self-reported