HLE

reasoning

text

About

HLE (Higher Level Evaluation) benchmark for evaluating complex reasoning capabilities. For GLM-4.5, only text-based questions were evaluated.

Evaluation Stats

Total Models3

Organizations2

Verified Results0

Self-Reported3

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

3 models

Top Score

14.4%

Average Score

9.9%

High Performers (80%+)

#1Zhipu AI

2 models

12.5%

#2Moonshot AI

1 model

4.7%

Leaderboard

3 models ranked by performance on HLE