
Kimi K2 Base
Zero-eval
#1C-Eval
#1MMLU-redux-2.0
#1TriviaQA
+4 more
by Moonshot AI
About
Kimi K2 Base is a language model developed by Moonshot AI. It achieves strong performance with an average score of 69.2% across 13 benchmarks. It excels particularly in C-Eval (92.5%), GSM8k (92.1%), MMLU-redux-2.0 (90.2%). The model shows particular specialization in math tasks with an average performance of 81.2%. Released in 2025, it represents Moonshot AI's latest advancement in AI technology.
Timeline
AnnouncedJan 1, 2025
ReleasedJan 1, 2025
Specifications
Training Tokens15.5T
License & Family
License
Modified MIT License
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
13 benchmarks
Average Score
69.2%
Best Score
92.5%
High Performers (80%+)
6Top Categories
math
81.2%
general
67.3%
code
66.4%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
C-Eval
Rank #1 of 6
#1Kimi K2 Base
92.5%
#2DeepSeek-R1
91.8%
#3Kimi-k1.5
88.3%
#4DeepSeek-V3
86.5%
GSM8k
Rank #20 of 46
#17Nova Micro
92.3%
#18Claude 3 Sonnet
92.3%
#19Mistral Large 2
93.0%
#20Kimi K2 Base
92.1%
#21Qwen2.5 7B Instruct
91.6%
#22Llama 3.1 Nemotron 70B Instruct
91.4%
#23Qwen2 72B Instruct
91.1%
MMLU-redux-2.0
Rank #1 of 1
#1Kimi K2 Base
90.2%
MMLU
Rank #13 of 78
#10Qwen3 235B A22B
87.8%
#11DeepSeek-V3
88.5%
#12GPT-4o
88.7%
#13Kimi K2 Base
87.8%
#14GPT-4.1 mini
87.5%
#15Grok-2
87.5%
#16Kimi-k1.5
87.4%
TriviaQA
Rank #1 of 13
#1Kimi K2 Base
85.1%
#2Gemma 2 27B
83.7%
#3Mistral Small 3.1 24B Base
80.5%
#4Mistral Small 3.1 24B Instruct
80.5%
All Benchmark Results for Kimi K2 Base
Complete list of benchmark scores with detailed information
C-Eval C-Eval benchmark | code | text | 0.93 | 92.5% | Self-reported |
GSM8k GSM8k benchmark | math | text | 0.92 | 92.1% | Self-reported |
MMLU-redux-2.0 MMLU-redux-2.0 benchmark | general | text | 0.90 | 90.2% | Self-reported |
MMLU MMLU benchmark | general | text | 0.88 | 87.8% | Self-reported |
TriviaQA TriviaQA benchmark | general | text | 0.85 | 85.1% | Self-reported |
EvalPlus EvalPlus benchmark | code | text | 80.30 | 80.3% | Self-reported |
CSimpleQA CSimpleQA benchmark | general | text | 0.78 | 77.6% | Self-reported |
MATH MATH benchmark | math | text | 0.70 | 70.2% | Self-reported |
MMLU-Pro MMLU-Pro benchmark | general | text | 0.69 | 69.2% | Self-reported |
GPQA GPQA benchmark | general | text | 0.48 | 48.1% | Self-reported |
Showing 1 to 10 of 13 benchmarks