
Codestral-22B
Zero-eval
#1HumanEvalFIM-Average
#1Spider
#1HumanEval-Average
+2 more
by Mistral AI
About
Codestral-22B is a language model developed by Mistral AI. It achieves strong performance with an average score of 65.9% across 7 benchmarks. It excels particularly in HumanEvalFIM-Average (91.6%), HumanEval (81.1%), MBPP (78.2%). The model is available through 2 API providers. Released in 2024, it represents Mistral AI's latest advancement in AI technology.
Pricing Range
Input (per 1M)$0.20 -$0.20
Output (per 1M)$0.60 -$0.60
Providers2
Timeline
AnnouncedMay 29, 2024
ReleasedMay 29, 2024
Specifications
License & Family
License
MNPL-0.1
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
7 benchmarks
Average Score
65.9%
Best Score
91.6%
High Performers (80%+)
2Performance Metrics
Max Context Window
65.5KAvg Throughput
21.1 tok/sAvg Latency
0msTop Categories
code
71.2%
general
34.0%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
HumanEvalFIM-Average
Rank #1 of 1
#1Codestral-22B
91.6%
HumanEval
Rank #39 of 62
#36IBM Granite 4.0 Tiny Preview
82.4%
#37Phi 4
82.6%
#38Qwen2.5 14B Instruct
83.5%
#39Codestral-22B
81.1%
#40Nova Micro
81.1%
#41Llama 3.1 70B Instruct
80.5%
#42Qwen2 7B Instruct
79.9%
MBPP
Rank #13 of 31
#10Qwen2.5 7B Instruct
79.2%
#11Qwen2 72B Instruct
80.2%
#12Phi-3.5-MoE-instruct
80.8%
#13Codestral-22B
78.2%
#14Llama 4 Maverick
77.6%
#15Gemini Diffusion
76.0%
#16Mistral Small 3.1 24B Instruct
74.7%
Spider
Rank #1 of 1
#1Codestral-22B
63.5%
HumanEval-Average
Rank #1 of 1
#1Codestral-22B
61.5%
All Benchmark Results for Codestral-22B
Complete list of benchmark scores with detailed information
HumanEvalFIM-Average HumanEvalFIM-Average benchmark | code | text | 0.92 | 91.6% | Self-reported |
HumanEval HumanEval benchmark | code | text | 0.81 | 81.1% | Self-reported |
MBPP MBPP benchmark | code | text | 78.20 | 78.2% | Self-reported |
Spider Spider benchmark | code | text | 0.64 | 63.5% | Self-reported |
HumanEval-Average HumanEval-Average benchmark | code | text | 0.61 | 61.5% | Self-reported |
CruxEval-O CruxEval-O benchmark | code | text | 0.51 | 51.3% | Self-reported |
RepoBench RepoBench benchmark | general | text | 0.34 | 34.0% | Self-reported |