Mistral AI

Mistral Small 3 24B Base

Multimodal
Zero-eval
#1AGIEval

by Mistral AI

About

Mistral Small 3 24B Base is a multimodal language model developed by Mistral AI. It achieves strong performance with an average score of 67.0% across 9 benchmarks. It excels particularly in ARC-C (91.3%), GSM8k (80.7%), MMLU (80.7%). As a multimodal model, it can process and understand text, images, and other input formats seamlessly. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Mistral AI's latest advancement in AI technology.

Timeline
AnnouncedJan 30, 2025
ReleasedJan 30, 2025
Knowledge CutoffOct 1, 2023
Specifications
Capabilities
Multimodal
License & Family
License
Apache 2.0
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

9 benchmarks
Average Score
67.0%
Best Score
91.3%
High Performers (80%+)
4

Top Categories

reasoning
91.3%
code
67.7%
math
63.4%
general
62.4%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

ARC-C

Rank #8 of 31
#5Nova Lite
92.4%
#6Jamba 1.5 Large
93.0%
#7Claude 3 Sonnet
93.2%
#8Mistral Small 3 24B Base
91.3%
#9Phi-3.5-MoE-instruct
91.0%
#10Nova Micro
90.2%
#11Claude 3 Haiku
89.2%

GSM8k

Rank #38 of 46
#35Granite 3.3 8B Instruct
80.9%
#36Qwen2 7B Instruct
82.3%
#37Qwen2.5-Coder 7B Instruct
83.9%
#38Mistral Small 3 24B Base
80.7%
#39Llama 3.2 3B Instruct
77.7%
#40Jamba 1.5 Mini
75.8%
#41Gemma 2 27B
74.0%

MMLU

Rank #39 of 78
#36Mistral Small 3.1 24B Base
81.0%
#37Jamba 1.5 Large
81.2%
#38Grok-1.5
81.3%
#39Mistral Small 3 24B Base
80.7%
#40Mistral Small 3.1 24B Instruct
80.6%
#41Mistral Small 3.2 24B Instruct
80.5%
#42Nova Lite
80.5%

TriviaQA

Rank #5 of 13
#2Mistral Small 3.1 24B Instruct
80.5%
#3Mistral Small 3.1 24B Base
80.5%
#4Gemma 2 27B
83.7%
#5Mistral Small 3 24B Base
80.3%
#6Granite 3.3 8B Base
78.2%
#7Gemma 2 9B
76.6%
#8Mistral NeMo Instruct
73.8%

MBPP

Rank #20 of 31
#17Gemma 3 12B
73.0%
#18Qwen2.5-Omni-7B
73.2%
#19Gemma 3 27B
74.4%
#20Mistral Small 3 24B Base
69.6%
#21Phi-3.5-mini-instruct
69.6%
#22Llama 4 Scout
67.8%
#23Qwen2 7B Instruct
67.2%
All Benchmark Results for Mistral Small 3 24B Base
Complete list of benchmark scores with detailed information
ARC-C
ARC-C benchmark
reasoning
text
0.91
91.3%
Self-reported
GSM8k
GSM8k benchmark
math
text
0.81
80.7%
Self-reported
MMLU
MMLU benchmark
general
text
0.81
80.7%
Self-reported
TriviaQA
TriviaQA benchmark
general
text
0.80
80.3%
Self-reported
MBPP
MBPP benchmark
code
text
69.64
69.6%
Self-reported
AGIEval
AGIEval benchmark
code
text
0.66
65.8%
Self-reported
MMLU-Pro
MMLU-Pro benchmark
general
text
0.54
54.4%
Self-reported
MATH
MATH benchmark
math
text
0.46
46.0%
Self-reported
GPQA
GPQA benchmark
general
text
0.34
34.4%
Self-reported