Mistral Small 3 24B Instruct

Name: Mistral Small 3 24B Instruct
Price: 0.07 USD
Rating: 71.7 (8 reviews)
Author: Mistral AI

Zero-eval

#2Wild Bench

by Mistral AI

About

Mistral Small 3 24B Instruct is a language model developed by Mistral AI. It achieves strong performance with an average score of 71.7% across 8 benchmarks. It excels particularly in Arena Hard (87.6%), HumanEval (84.8%), MT-Bench (83.5%). The model shows particular specialization in code tasks with an average performance of 83.9%. The model is available through 2 API providers. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Mistral AI's latest advancement in AI technology.

Pricing Range

Input (per 1M)$0.07 -$0.10

Output (per 1M)$0.14 -$0.30

Providers2

Timeline

AnnouncedJan 30, 2025

ReleasedJan 30, 2025

Knowledge CutoffOct 1, 2023

Specifications

License & Family

License

Apache 2.0

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

8 benchmarks

Average Score

71.7%

Best Score

87.6%

High Performers (80%+)

Performance Metrics

Max Context Window

64.0K

Avg Throughput

91.5 tok/s

Avg Latency

0ms

Top Categories

code

83.9%

roleplay

83.5%

math

70.6%

general

62.8%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

Arena Hard

Rank #6 of 22

#3Llama-3.3 Nemotron Super 49B v1

88.3%

#4Qwen3 30B A3B

91.0%

#5DeepSeek-R1

92.3%

#6Mistral Small 3 24B Instruct

87.6%

#7Qwen2.5 72B Instruct

81.2%

#8Phi 4 Reasoning Plus

79.0%

#9DeepSeek-V2.5

76.2%

HumanEval

Rank #34 of 62

#31Qwen2.5 7B Instruct

84.8%

#32Claude 3 Opus

84.9%

#33Gemma 3 12B

85.4%

#34Mistral Small 3 24B Instruct

84.8%

#35Gemini 1.5 Pro

84.1%

#36Qwen2.5 14B Instruct

83.5%

#37Phi 4

82.6%

MT-Bench

Rank #7 of 11

#4Qwen2 7B Instruct

84.1%

#5Mistral Large 2

86.3%

#6Qwen2.5 7B Instruct

87.5%

#7Mistral Small 3 24B Instruct

83.5%

#8Ministral 8B Instruct

83.0%

#9Llama 3.1 Nemotron Nano 8B V1

81.0%

#10Pixtral-12B

76.8%

IFEval

Rank #25 of 37

#22DeepSeek-R1

83.3%

#23Phi 4 Reasoning

83.4%

#24QwQ-32B

83.9%

#25Mistral Small 3 24B Instruct

82.9%

#26GPT-4o

81.0%

#27Llama 3.1 8B Instruct

80.4%

#28Gemma 3 1B

80.2%

MATH

Rank #31 of 63

#28Claude 3.5 Sonnet

71.1%

#29Qwen2.5-Omni-7B

71.5%

#30Qwen3 235B A22B

71.8%

#31Mistral Small 3 24B Instruct

70.6%

#32GPT-4o mini

70.2%

#33Kimi K2 Base

70.2%

#34Mistral Small 3.2 24B Instruct

69.4%

All Benchmark Results for Mistral Small 3 24B Instruct

Complete list of benchmark scores with detailed information


Arena Hard Arena Hard benchmark	general	text	0.88	87.6%	Self-reported
HumanEval HumanEval benchmark	code	text	0.85	84.8%	Self-reported
MT-Bench MT-Bench benchmark	roleplay	text	83.50	83.5%	Self-reported
IFEval IFEval benchmark	code	text	0.83	82.9%	Self-reported
MATH MATH benchmark	math	text	0.71	70.6%	Self-reported
MMLU-Pro MMLU-Pro benchmark	general	text	0.66	66.3%	Self-reported
Wild Bench Wild Bench benchmark	general	text	0.52	52.2%	Self-reported
GPQA GPQA benchmark	general	text	0.45	45.3%	Self-reported

Resources

API Reference Blog Post Model Weights