Mistral AI

Mistral Small 3.1 24B Instruct

Multimodal
Zero-eval

by Mistral AI

About

Mistral Small 3.1 24B Instruct is a multimodal language model developed by Mistral AI. It achieves strong performance with an average score of 64.0% across 9 benchmarks. It excels particularly in HumanEval (88.4%), MMLU (80.6%), TriviaQA (80.5%). The model shows particular specialization in code tasks with an average performance of 81.6%. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Mistral AI's latest advancement in AI technology.

Timeline
AnnouncedMar 17, 2025
ReleasedMar 17, 2025
Specifications
Capabilities
Multimodal
License & Family
License
Apache 2.0
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

9 benchmarks
Average Score
64.0%
Best Score
88.4%
High Performers (80%+)
3

Top Categories

code
81.6%
math
69.3%
vision
59.3%
general
56.9%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

HumanEval

Rank #16 of 62
#13DeepSeek-V2.5
89.0%
#14Nova Pro
89.0%
#15Llama 3.1 405B Instruct
89.0%
#16Mistral Small 3.1 24B Instruct
88.4%
#17Llama 3.3 70B Instruct
88.4%
#18Grok-2
88.4%
#19Qwen2.5-Coder 7B Instruct
88.4%

MMLU

Rank #40 of 78
#37Mistral Small 3 24B Base
80.7%
#38Mistral Small 3.1 24B Base
81.0%
#39Jamba 1.5 Large
81.2%
#40Mistral Small 3.1 24B Instruct
80.6%
#41Mistral Small 3.2 24B Instruct
80.5%
#42Nova Lite
80.5%
#43DeepSeek-V2.5
80.4%

TriviaQA

Rank #4 of 13
#1Mistral Small 3.1 24B Base
80.5%
#2Gemma 2 27B
83.7%
#3Kimi K2 Base
85.1%
#4Mistral Small 3.1 24B Instruct
80.5%
#5Mistral Small 3 24B Base
80.3%
#6Granite 3.3 8B Base
78.2%
#7Gemma 2 9B
76.6%

MBPP

Rank #16 of 31
#13Gemini Diffusion
76.0%
#14Llama 4 Maverick
77.6%
#15Codestral-22B
78.2%
#16Mistral Small 3.1 24B Instruct
74.7%
#17Gemma 3 27B
74.4%
#18Qwen2.5-Omni-7B
73.2%
#19Gemma 3 12B
73.0%

MATH

Rank #37 of 63
#34Nova Micro
69.3%
#35Claude 3.5 Haiku
69.4%
#36Mistral Small 3.2 24B Instruct
69.4%
#37Mistral Small 3.1 24B Instruct
69.3%
#38Llama 3.2 90B Instruct
68.0%
#39Phi 4 Mini
64.0%
#40Llama 4 Maverick
61.2%
All Benchmark Results for Mistral Small 3.1 24B Instruct
Complete list of benchmark scores with detailed information
HumanEval
HumanEval benchmark
code
text
0.88
88.4%
Self-reported
MMLU
MMLU benchmark
general
text
0.81
80.6%
Self-reported
TriviaQA
TriviaQA benchmark
general
text
0.81
80.5%
Self-reported
MBPP
MBPP benchmark
code
text
74.71
74.7%
Self-reported
MATH
MATH benchmark
math
text
0.69
69.3%
Self-reported
MMLU-Pro
MMLU-Pro benchmark
general
text
0.67
66.8%
Self-reported
MMMU
MMMU benchmark
vision
multimodal
0.59
59.3%
Self-reported
GPQA
GPQA benchmark
general
text
0.46
46.0%
Self-reported
SimpleQA
SimpleQA benchmark
general
text
0.10
10.4%
Self-reported