ZeroEval

zeroeval.com

Platform Stats

Total Models10

Organizations4

Verified Benchmarks0

Multimodal Models8

Pricing Overview

Avg Input (per 1M)$1.10

Avg Output (per 1M)$5.76

Cheapest Model$0.05

Premium Model$3.00

Supported Features

Number of models supporting each feature

web Search

function Calling

structured Output

code Execution

batch Inference

finetuning

Input Modalities

Models supporting different input types

text

10 (100%)

image

8 (80%)

audio

0 (0%)

video

0 (0%)

Models Overview

Top performers and pricing distribution

Pricing Distribution

Input pricing per 1M tokens

$0-1

6 models

$1-5

4 models

Top Performing Models

By benchmark avg

#1Claude 3.7 Sonnet

74.1%

#2GPT-5

70.1%

#3GPT OSS 120B

63.1%

#4Grok-4

63.1%

#5Gemini 2.5 Flash

62.5%

Most Affordable Models

GPT-5 nano

$0.05/1M

GPT OSS 20B

$0.10/1M

GPT OSS 120B

$0.15/1M

Available Models

10 models available through ZeroEval

			License
GPT-5 GPT-5 is our flagship model for coding, reasoning, and agentic tasks across domains. The best model for coding and agentic tasks with higher reasoning capabilities and medium speed.	OpenAI	Aug 7, 2025	Proprietary	74.9%	88.0%	93.4%	-	-
Claude 3.7 Sonnet The most intelligent Claude model and the first hybrid reasoning model on the market. Claude 3.7 Sonnet can produce near-instant responses or extended, step-by-step thinking that is made visible to the user. Shows particularly strong improvements in coding and front-end web development.	Anthropic	Feb 24, 2025	Proprietary	70.3%	-	-	-	-
Gemini 2.5 Flash A thinking model designed for a balance between price and performance. It builds upon Gemini 2.0 Flash with upgraded reasoning, hybrid thinking control, multimodal capabilities (text, image, video, audio input), and a 1M token input context window.	Google	May 20, 2025	Proprietary	60.4%	61.9%	-	-	-
GPT-4o GPT-4o ('o' for 'omni') is a multimodal AI model that accepts text, audio, image, and video inputs, and generates text, audio, and image outputs. It matches GPT-4 Turbo performance on text and code, with improvements in non-English languages, vision, and audio understanding.	OpenAI	Aug 6, 2024	Proprietary	33.2%	30.7%	-	-	-
GPT-4.1 mini GPT-4.1 mini provides a balance between intelligence, speed, and cost. It's a significant leap in small model performance, even beating GPT-4o in many benchmarks while reducing latency and cost.	OpenAI	Apr 14, 2025	Proprietary	23.6%	34.7%	-	-	-
GPT-5 nano GPT-5 nano is our fastest, cheapest version of GPT-5. It's great for summarization and classification tasks with average reasoning capabilities and very fast speed.	OpenAI	Aug 7, 2025	Proprietary	-	-	-	-	-
GPT OSS 20B The gpt-oss-20b model achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while running efficiently on a single 80 GB GPU. The gpt-oss-20b model delivers similar results to OpenAI o3‑mini on common benchmarks and can run on edge devices with just 16 GB of memory, making it ideal for on-device use cases, local inference, or rapid iteration without costly infrastructure. Both models also perform strongly on tool use, few-shot function calling, CoT reasoning (as seen in results on the Tau-Bench agentic evaluation suite) and HealthBench (even outperforming proprietary models like OpenAI o1 and GPT‑4o).	OpenAI	Aug 5, 2025	Apache 2.0	-	-	-	-	-
GPT OSS 120B The gpt-oss-120b model achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while running efficiently on a single 80 GB GPU. The gpt-oss-20b model delivers similar results to OpenAI o3‑mini on common benchmarks and can run on edge devices with just 16 GB of memory, making it ideal for on-device use cases, local inference, or rapid iteration without costly infrastructure. Both models also perform strongly on tool use, few-shot function calling, CoT reasoning (as seen in results on the Tau-Bench agentic evaluation suite) and HealthBench (even outperforming proprietary models like OpenAI o1 and GPT‑4o).	OpenAI	Aug 5, 2025	Apache 2.0	-	-	-	-	-
GPT-5 mini A faster, more cost-efficient version of GPT-5 for well-defined tasks. Great for well-defined tasks and precise prompts with high reasoning capabilities at reduced cost.	OpenAI	Aug 7, 2025	Proprietary	-	-	-	-	-
Grok-4 Grok 4, announced by xAI in summer 2025, represents a major leap in AI capabilities, described as 'the smartest AI in the world.' Built on version 6 of xAI's foundation model, it uses 100x more training compute than Grok 2 and 10x more reinforcement learning compute than Grok 3. The model achieves PhD-level performance across all academic disciplines simultaneously, scoring perfect on standardized tests like the SAT and near-perfect on graduate exams like the GRE. Unlike Grok 3, tool usage is built into the training process rather than relying on generalization. Trained using 200,000 GPUs, Grok 4 excels at complex reasoning, mathematical problem-solving, and coding tasks, though it has acknowledged weaknesses in multimodal capabilities that are being addressed in the next version.	xAI	Jul 9, 2025	Proprietary	-	-	-	79.0%	-

Resources

Official Website