HLE
reasoning
text
About
HLE (Higher Level Evaluation) benchmark for evaluating complex reasoning capabilities. For GLM-4.5, only text-based questions were evaluated.
Evaluation Stats
Total Models3
Organizations2
Verified Results0
Self-Reported3
Benchmark Details
Max Score1
Language
en
Performance Overview
Score distribution and top performers
Score Distribution
3 models
Top Score
14.4%
Average Score
9.9%
High Performers (80%+)
0Top Organizations
#1Zhipu AI
2 models
12.5%
#2Moonshot AI
1 model
4.7%
Leaderboard
3 models ranked by performance on HLE
License | Links | ||||
---|---|---|---|---|---|
Jul 28, 2025 | MIT | 14.4% | |||
Jul 28, 2025 | MIT | 10.6% | |||
Sep 5, 2025 | MIT | 4.7% |