TAU-bench Retail

agents
text
About

TAU-bench Retail benchmark

Evaluation Stats
Total Models15
Organizations3
Verified Results0
Self-Reported15
Benchmark Details
Max Score1
Language
en
Performance Overview
Score distribution and top performers

Score Distribution

15 models
Top Score
81.4%
Average Score
64.2%
High Performers (80%+)
3

Top Organizations

#1Anthropic
6 models
70.6%
#2DeepSeek
1 model
63.9%
#3OpenAI
8 models
59.4%
Leaderboard
Top 15 models ranked by performance
81.4%
Raw: 0.814
Self-reported
81.2%
Raw: 0.812
Self-reported
80.5%
Raw: 0.805
Self-reported
71.8%
Raw: 0.718
Self-reported
70.8%
Raw: 0.708
Self-reported
69.2%
Raw: 0.692
Self-reported
68.4%
Raw: 0.684
Self-reported
68.0%
Raw: 0.68
Self-reported
63.9%
Raw: 0.639
Self-reported
60.4%
Raw: 0.604
Self-reported
60.3%
Raw: 0.603
Self-reported
57.6%
Raw: 0.576
Self-reported
55.8%
Raw: 0.558
Self-reported
51.0%
Raw: 0.51
Self-reported
22.6%
Raw: 0.226
Self-reported