
Mistral Large 2
Zero-eval
#1MMLU French
by Mistral AI
About
Mistral Large 2 is a language model developed by Mistral AI. This model demonstrates exceptional performance with an average score of 87.6% across 5 benchmarks. It excels particularly in GSM8k (93.0%), HumanEval (92.0%), MT-Bench (86.3%). It supports a 256K token context window for handling large documents. The model is available through 2 API providers. Released in 2024, it represents Mistral AI's latest advancement in AI technology.
Pricing Range
Input (per 1M)$2.00 -$2.00
Output (per 1M)$6.00 -$6.00
Providers2
Timeline
AnnouncedJul 24, 2024
ReleasedJul 24, 2024
Specifications
License & Family
License
Mistral Research License
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
5 benchmarks
Average Score
87.6%
Best Score
93.0%
High Performers (80%+)
5Performance Metrics
Max Context Window
256.0KAvg Throughput
21.1 tok/sAvg Latency
0msTop Categories
math
93.0%
code
92.0%
roleplay
86.3%
general
83.4%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
GSM8k
Rank #17 of 46
#14Qwen3 235B A22B
94.4%
#15Gemma 3 12B
94.4%
#16Nova Lite
94.5%
#17Mistral Large 2
93.0%
#18Claude 3 Sonnet
92.3%
#19Nova Micro
92.3%
#20Kimi K2 Base
92.1%
HumanEval
Rank #7 of 62
#4Claude 3.5 Sonnet
92.0%
#5o1-mini
92.4%
#6Qwen2.5-Coder 32B Instruct
92.7%
#7Mistral Large 2
92.0%
#8Qwen2.5 VL 32B Instruct
91.5%
#9GPT-4o
90.2%
#10Granite 3.3 8B Instruct
89.7%
MT-Bench
Rank #5 of 11
#2Qwen2.5 7B Instruct
87.5%
#3DeepSeek-V2.5
90.2%
#4Llama-3.3 Nemotron Super 49B v1
91.7%
#5Mistral Large 2
86.3%
#6Qwen2 7B Instruct
84.1%
#7Mistral Small 3 24B Instruct
83.5%
#8Ministral 8B Instruct
83.0%
MMLU
Rank #31 of 78
#28Phi 4
84.8%
#29o1-mini
85.2%
#30Llama 4 Maverick
85.5%
#31Mistral Large 2
84.0%
#32Llama 3.1 70B Instruct
83.6%
#33Qwen2.5 32B Instruct
83.3%
#34Qwen2 72B Instruct
82.3%
MMLU French
Rank #1 of 1
#1Mistral Large 2
82.8%
All Benchmark Results for Mistral Large 2
Complete list of benchmark scores with detailed information
GSM8k GSM8k benchmark | math | text | 0.93 | 93.0% | Self-reported |
HumanEval HumanEval benchmark | code | text | 0.92 | 92.0% | Self-reported |
MT-Bench MT-Bench benchmark | roleplay | text | 86.30 | 86.3% | Self-reported |
MMLU MMLU benchmark | general | text | 0.84 | 84.0% | Self-reported |
MMLU French MMLU French benchmark | general | text | 0.83 | 82.8% | Self-reported |