Codestral-22B

Name: Codestral-22B
Price: 0.2 USD
Rating: 65.9 (7 reviews)
Author: Mistral AI

Zero-eval

#1HumanEvalFIM-Average

#1Spider

#1HumanEval-Average

+2 more

by Mistral AI

About

Codestral-22B is a language model developed by Mistral AI. It achieves strong performance with an average score of 65.9% across 7 benchmarks. It excels particularly in HumanEvalFIM-Average (91.6%), HumanEval (81.1%), MBPP (78.2%). The model is available through 2 API providers. Released in 2024, it represents Mistral AI's latest advancement in AI technology.

Pricing Range

Input (per 1M)$0.20 -$0.20

Output (per 1M)$0.60 -$0.60

Providers2

Timeline

AnnouncedMay 29, 2024

ReleasedMay 29, 2024

Specifications

License & Family

License

MNPL-0.1

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

7 benchmarks

Average Score

65.9%

Best Score

91.6%

High Performers (80%+)

Performance Metrics

Max Context Window

65.5K

Avg Throughput

21.1 tok/s

Avg Latency

0ms

Top Categories

code

71.2%

general

34.0%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

HumanEvalFIM-Average

Rank #1 of 1

#1Codestral-22B

91.6%

HumanEval

Rank #39 of 62

#36IBM Granite 4.0 Tiny Preview

82.4%

#37Phi 4

82.6%

#38Qwen2.5 14B Instruct

83.5%

#39Codestral-22B

81.1%

#40Nova Micro

81.1%

#41Llama 3.1 70B Instruct

80.5%

#42Qwen2 7B Instruct

79.9%

MBPP

Rank #13 of 31

#10Qwen2.5 7B Instruct

79.2%

#11Qwen2 72B Instruct

80.2%

#12Phi-3.5-MoE-instruct

80.8%

#13Codestral-22B

78.2%

#14Llama 4 Maverick

77.6%

#15Gemini Diffusion

76.0%

#16Mistral Small 3.1 24B Instruct

74.7%

Spider

Rank #1 of 1

#1Codestral-22B

63.5%

HumanEval-Average

Rank #1 of 1

#1Codestral-22B

61.5%

All Benchmark Results for Codestral-22B

Complete list of benchmark scores with detailed information


HumanEvalFIM-Average HumanEvalFIM-Average benchmark	code	text	0.92	91.6%	Self-reported
HumanEval HumanEval benchmark	code	text	0.81	81.1%	Self-reported
MBPP MBPP benchmark	code	text	78.20	78.2%	Self-reported
Spider Spider benchmark	code	text	0.64	63.5%	Self-reported
HumanEval-Average HumanEval-Average benchmark	code	text	0.61	61.5%	Self-reported
CruxEval-O CruxEval-O benchmark	code	text	0.51	51.3%	Self-reported
RepoBench RepoBench benchmark	general	text	0.34	34.0%	Self-reported

Resources

API Reference Playground Blog Post Model Weights