Magistral Small 2506

Name: Magistral Small 2506
Rating: 63.2 (4 reviews)
Author: Mistral AI

Zero-eval

by Mistral AI

About

Magistral Small 2506 is a language model developed by Mistral AI. It achieves strong performance with an average score of 63.2% across 4 benchmarks. Notable strengths include AIME 2024 (70.7%), GPQA (68.2%), AIME 2025 (62.8%). It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Mistral AI's latest advancement in AI technology.

Timeline

AnnouncedJun 10, 2025

ReleasedJun 10, 2025

Knowledge CutoffJun 1, 2025

Specifications

License & Family

License

Apache 2.0

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

4 benchmarks

Average Score

63.2%

Best Score

70.7%

High Performers (80%+)

Top Categories

general

67.2%

code

51.3%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

AIME 2024

Rank #30 of 41

#27Gemini 2.0 Flash Thinking

73.3%

#28Magistral Medium

73.6%

#29o1

74.3%

#30Magistral Small 2506

70.7%

#31Kimi K2 Instruct

69.6%

#32DeepSeek-V3 0324

59.4%

#33DeepSeek R1 Distill Qwen 1.5B

52.7%

GPQA

Rank #34 of 115

#31DeepSeek-V3 0324

68.4%

#32Phi 4 Reasoning Plus

68.9%

#33GPT-4.5

69.5%

#34Magistral Small 2506

68.2%

#35Claude 3.5 Sonnet

67.2%

#36Llama-3.3 Nemotron Super 49B v1

66.7%

#37GPT-4.1

66.3%

AIME 2025

Rank #24 of 36

#21Phi 4 Reasoning

62.9%

#22Magistral Medium

64.9%

#23Claude Sonnet 4

70.5%

#24Magistral Small 2506

62.8%

#25Llama-3.3 Nemotron Super 49B v1

58.4%

#26Claude 3.7 Sonnet

54.8%

#27Gemini 2.5 Flash-Lite

49.8%

LiveCodeBench

Rank #19 of 44

#16DeepSeek R1 Distill Qwen 14B

53.1%

#17Phi 4 Reasoning Plus

53.1%

#18Phi 4 Reasoning

53.8%

#19Magistral Small 2506

51.3%

#20Magistral Medium

50.3%

#21DeepSeek R1 Zero

50.0%

#22QwQ-32B-Preview

50.0%

All Benchmark Results for Magistral Small 2506

Complete list of benchmark scores with detailed information


AIME 2024 AIME 2024 benchmark	general	text	0.71	70.7%	Self-reported
GPQA GPQA benchmark	general	text	0.68	68.2%	Self-reported
AIME 2025 AIME 2025 benchmark	general	text	0.63	62.8%	Self-reported
LiveCodeBench LiveCodeBench benchmark	code	text	0.51	51.3%	Self-reported

Resources

API Reference Playground Research Paper Blog Post Model Weights