Mistral AI

Magistral Small 2506

Zero-eval

by Mistral AI

About

Magistral Small 2506 is a language model developed by Mistral AI. It achieves strong performance with an average score of 63.2% across 4 benchmarks. Notable strengths include AIME 2024 (70.7%), GPQA (68.2%), AIME 2025 (62.8%). It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Mistral AI's latest advancement in AI technology.

Timeline
AnnouncedJun 10, 2025
ReleasedJun 10, 2025
Knowledge CutoffJun 1, 2025
Specifications
License & Family
License
Apache 2.0
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

4 benchmarks
Average Score
63.2%
Best Score
70.7%
High Performers (80%+)
0

Top Categories

general
67.2%
code
51.3%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

AIME 2024

Rank #30 of 41
#27Gemini 2.0 Flash Thinking
73.3%
#28Magistral Medium
73.6%
#29o1
74.3%
#30Magistral Small 2506
70.7%
#31Kimi K2 Instruct
69.6%
#32DeepSeek-V3 0324
59.4%
#33DeepSeek R1 Distill Qwen 1.5B
52.7%

GPQA

Rank #34 of 115
#31DeepSeek-V3 0324
68.4%
#32Phi 4 Reasoning Plus
68.9%
#33GPT-4.5
69.5%
#34Magistral Small 2506
68.2%
#35Claude 3.5 Sonnet
67.2%
#36Llama-3.3 Nemotron Super 49B v1
66.7%
#37GPT-4.1
66.3%

AIME 2025

Rank #24 of 36
#21Phi 4 Reasoning
62.9%
#22Magistral Medium
64.9%
#23Claude Sonnet 4
70.5%
#24Magistral Small 2506
62.8%
#25Llama-3.3 Nemotron Super 49B v1
58.4%
#26Claude 3.7 Sonnet
54.8%
#27Gemini 2.5 Flash-Lite
49.8%

LiveCodeBench

Rank #19 of 44
#16DeepSeek R1 Distill Qwen 14B
53.1%
#17Phi 4 Reasoning Plus
53.1%
#18Phi 4 Reasoning
53.8%
#19Magistral Small 2506
51.3%
#20Magistral Medium
50.3%
#21DeepSeek R1 Zero
50.0%
#22QwQ-32B-Preview
50.0%
All Benchmark Results for Magistral Small 2506
Complete list of benchmark scores with detailed information
AIME 2024
AIME 2024 benchmark
general
text
0.71
70.7%
Self-reported
GPQA
GPQA benchmark
general
text
0.68
68.2%
Self-reported
AIME 2025
AIME 2025 benchmark
general
text
0.63
62.8%
Self-reported
LiveCodeBench
LiveCodeBench benchmark
code
text
0.51
51.3%
Self-reported