TAU-bench Retail
agents
text
About
TAU-bench Retail benchmark
Evaluation Stats
Total Models15
Organizations3
Verified Results0
Self-Reported15
Benchmark Details
Max Score1
Language
en
Performance Overview
Score distribution and top performers
Score Distribution
15 models
Top Score
81.4%
Average Score
64.2%
High Performers (80%+)
3Top Organizations
#1Anthropic
6 models
70.6%
#2DeepSeek
1 model
63.9%
#3OpenAI
8 models
59.4%
Leaderboard
Top 15 models ranked by performance
81.4%
Raw: 0.814
Self-reported
81.2%
Raw: 0.812
Self-reported
80.5%
Raw: 0.805
Self-reported
69.2%
Raw: 0.692
Self-reported
63.9%
Raw: 0.639
Self-reported
60.4%
Raw: 0.604
Self-reported
13
55.8%
Raw: 0.558
Self-reported
51.0%
Raw: 0.51
Self-reported
15
22.6%
Raw: 0.226
Self-reported