ZeroEval

zeroeval.com
Platform Stats
Total Models10
Organizations4
Verified Benchmarks0
Multimodal Models8
Pricing Overview
Avg Input (per 1M)$1.10
Avg Output (per 1M)$5.76
Cheapest Model$0.05
Premium Model$3.00
Supported Features
Number of models supporting each feature
web Search
4
function Calling
10
structured Output
10
code Execution
5
batch Inference
10
finetuning
3
Input Modalities
Models supporting different input types
text
10 (100%)
image
8 (80%)
audio
0 (0%)
video
0 (0%)
Models Overview
Top performers and pricing distribution

Pricing Distribution

Input pricing per 1M tokens
$0-1
6 models
$1-5
4 models

Top Performing Models

By benchmark avg
#1Claude 3.7 Sonnet
74.1%
#2GPT-5
70.1%
#3GPT OSS 120B
63.1%
#4Grok-4
63.1%
#5Gemini 2.5 Flash
62.5%

Most Affordable Models

GPT-5 nano
$0.05/1M
GPT OSS 20B
$0.10/1M
GPT OSS 120B
$0.15/1M

Available Models

10 models available through ZeroEval

LicenseLinks
OpenAIGPT-5
GPT-5 is our flagship model for coding, reasoning, and agentic tasks across domains. The best model for coding and agentic tasks with higher reasoning capabilities and medium speed.
Aug 7, 2025
Proprietary
74.9%88.0%93.4%--
AnthropicClaude 3.7 Sonnet
The most intelligent Claude model and the first hybrid reasoning model on the market. Claude 3.7 Sonnet can produce near-instant responses or extended, step-by-step thinking that is made visible to the user. Shows particularly strong improvements in coding and front-end web development.
Feb 24, 2025
Proprietary
70.3%----
GoogleGemini 2.5 Flash
A thinking model designed for a balance between price and performance. It builds upon Gemini 2.0 Flash with upgraded reasoning, hybrid thinking control, multimodal capabilities (text, image, video, audio input), and a 1M token input context window.
May 20, 2025
Proprietary
60.4%61.9%---
OpenAIGPT-4o
GPT-4o ('o' for 'omni') is a multimodal AI model that accepts text, audio, image, and video inputs, and generates text, audio, and image outputs. It matches GPT-4 Turbo performance on text and code, with improvements in non-English languages, vision, and audio understanding.
Aug 6, 2024
Proprietary
33.2%30.7%---
OpenAIGPT-4.1 mini
GPT-4.1 mini provides a balance between intelligence, speed, and cost. It's a significant leap in small model performance, even beating GPT-4o in many benchmarks while reducing latency and cost.
Apr 14, 2025
Proprietary
23.6%34.7%---
OpenAIGPT-5 nano
GPT-5 nano is our fastest, cheapest version of GPT-5. It's great for summarization and classification tasks with average reasoning capabilities and very fast speed.
Aug 7, 2025
Proprietary
-----
OpenAIGPT OSS 20B
The gpt-oss-20b model achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while running efficiently on a single 80 GB GPU. The gpt-oss-20b model delivers similar results to OpenAI o3‑mini on common benchmarks and can run on edge devices with just 16 GB of memory, making it ideal for on-device use cases, local inference, or rapid iteration without costly infrastructure. Both models also perform strongly on tool use, few-shot function calling, CoT reasoning (as seen in results on the Tau-Bench agentic evaluation suite) and HealthBench (even outperforming proprietary models like OpenAI o1 and GPT‑4o).
Aug 5, 2025
Apache 2.0
-----
OpenAIGPT OSS 120B
The gpt-oss-120b model achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while running efficiently on a single 80 GB GPU. The gpt-oss-20b model delivers similar results to OpenAI o3‑mini on common benchmarks and can run on edge devices with just 16 GB of memory, making it ideal for on-device use cases, local inference, or rapid iteration without costly infrastructure. Both models also perform strongly on tool use, few-shot function calling, CoT reasoning (as seen in results on the Tau-Bench agentic evaluation suite) and HealthBench (even outperforming proprietary models like OpenAI o1 and GPT‑4o).
Aug 5, 2025
Apache 2.0
-----
OpenAIGPT-5 mini
A faster, more cost-efficient version of GPT-5 for well-defined tasks. Great for well-defined tasks and precise prompts with high reasoning capabilities at reduced cost.
Aug 7, 2025
Proprietary
-----
xAIGrok-4
Grok 4, announced by xAI in summer 2025, represents a major leap in AI capabilities, described as 'the smartest AI in the world.' Built on version 6 of xAI's foundation model, it uses 100x more training compute than Grok 2 and 10x more reinforcement learning compute than Grok 3. The model achieves PhD-level performance across all academic disciplines simultaneously, scoring perfect on standardized tests like the SAT and near-perfect on graduate exams like the GRE. Unlike Grok 3, tool usage is built into the training process rather than relying on generalization. Trained using 200,000 GPUs, Grok 4 excels at complex reasoning, mathematical problem-solving, and coding tasks, though it has acknowledged weaknesses in multimodal capabilities that are being addressed in the next version.
Jul 9, 2025
Proprietary
---79.0%-