🚀 Website under development • Launching soon

HLE

reasoning
text
About

HLE (Higher Level Evaluation) benchmark for evaluating complex reasoning capabilities. For GLM-4.5, only text-based questions were evaluated.

Evaluation Stats
Total Models3
Organizations2
Verified Results0
Self-Reported3
Benchmark Details
Max Score1
Language
en
Performance Overview
Score distribution and top performers

Score Distribution

3 models
Top Score
14.4%
Average Score
9.9%
High Performers (80%+)
0

Top Organizations

#1Zhipu AI
2 models
12.5%
#2Moonshot AI
1 model
4.7%
Leaderboard
3 models ranked by performance on HLE
LicenseLinks
Jul 28, 2025
MIT
14.4%
Jul 28, 2025
MIT
10.6%
Sep 5, 2025
MIT
4.7%