
Mistral Small 3 24B Instruct
Zero-eval
#2Wild Bench
by Mistral AI
About
Mistral Small 3 24B Instruct is a language model developed by Mistral AI. It achieves strong performance with an average score of 71.7% across 8 benchmarks. It excels particularly in Arena Hard (87.6%), HumanEval (84.8%), MT-Bench (83.5%). The model shows particular specialization in code tasks with an average performance of 83.9%. The model is available through 2 API providers. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Mistral AI's latest advancement in AI technology.
Pricing Range
Input (per 1M)$0.07 -$0.10
Output (per 1M)$0.14 -$0.30
Providers2
Timeline
AnnouncedJan 30, 2025
ReleasedJan 30, 2025
Knowledge CutoffOct 1, 2023
Specifications
License & Family
License
Apache 2.0
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
8 benchmarks
Average Score
71.7%
Best Score
87.6%
High Performers (80%+)
4Performance Metrics
Max Context Window
64.0KAvg Throughput
91.5 tok/sAvg Latency
0msTop Categories
code
83.9%
roleplay
83.5%
math
70.6%
general
62.8%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
Arena Hard
Rank #6 of 22
#3Llama-3.3 Nemotron Super 49B v1
88.3%
#4Qwen3 30B A3B
91.0%
#5DeepSeek-R1
92.3%
#6Mistral Small 3 24B Instruct
87.6%
#7Qwen2.5 72B Instruct
81.2%
#8Phi 4 Reasoning Plus
79.0%
#9DeepSeek-V2.5
76.2%
HumanEval
Rank #34 of 62
#31Qwen2.5 7B Instruct
84.8%
#32Claude 3 Opus
84.9%
#33Gemma 3 12B
85.4%
#34Mistral Small 3 24B Instruct
84.8%
#35Gemini 1.5 Pro
84.1%
#36Qwen2.5 14B Instruct
83.5%
#37Phi 4
82.6%
MT-Bench
Rank #7 of 11
#4Qwen2 7B Instruct
84.1%
#5Mistral Large 2
86.3%
#6Qwen2.5 7B Instruct
87.5%
#7Mistral Small 3 24B Instruct
83.5%
#8Ministral 8B Instruct
83.0%
#9Llama 3.1 Nemotron Nano 8B V1
81.0%
#10Pixtral-12B
76.8%
IFEval
Rank #25 of 37
#22DeepSeek-R1
83.3%
#23Phi 4 Reasoning
83.4%
#24QwQ-32B
83.9%
#25Mistral Small 3 24B Instruct
82.9%
#26GPT-4o
81.0%
#27Llama 3.1 8B Instruct
80.4%
#28Gemma 3 1B
80.2%
MATH
Rank #31 of 63
#28Claude 3.5 Sonnet
71.1%
#29Qwen2.5-Omni-7B
71.5%
#30Qwen3 235B A22B
71.8%
#31Mistral Small 3 24B Instruct
70.6%
#32GPT-4o mini
70.2%
#33Kimi K2 Base
70.2%
#34Mistral Small 3.2 24B Instruct
69.4%
All Benchmark Results for Mistral Small 3 24B Instruct
Complete list of benchmark scores with detailed information
Arena Hard Arena Hard benchmark | general | text | 0.88 | 87.6% | Self-reported |
HumanEval HumanEval benchmark | code | text | 0.85 | 84.8% | Self-reported |
MT-Bench MT-Bench benchmark | roleplay | text | 83.50 | 83.5% | Self-reported |
IFEval IFEval benchmark | code | text | 0.83 | 82.9% | Self-reported |
MATH MATH benchmark | math | text | 0.71 | 70.6% | Self-reported |
MMLU-Pro MMLU-Pro benchmark | general | text | 0.66 | 66.3% | Self-reported |
Wild Bench Wild Bench benchmark | general | text | 0.52 | 52.2% | Self-reported |
GPQA GPQA benchmark | general | text | 0.45 | 45.3% | Self-reported |
Resources