Moonshot AI

Kimi K2 Base

Zero-eval
#1C-Eval
#1MMLU-redux-2.0
#1TriviaQA
+4 more

by Moonshot AI

About

Kimi K2 Base is a language model developed by Moonshot AI. It achieves strong performance with an average score of 69.2% across 13 benchmarks. It excels particularly in C-Eval (92.5%), GSM8k (92.1%), MMLU-redux-2.0 (90.2%). The model shows particular specialization in math tasks with an average performance of 81.2%. Released in 2025, it represents Moonshot AI's latest advancement in AI technology.

Timeline
AnnouncedJan 1, 2025
ReleasedJan 1, 2025
Specifications
Training Tokens15.5T
License & Family
License
Modified MIT License
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

13 benchmarks
Average Score
69.2%
Best Score
92.5%
High Performers (80%+)
6

Top Categories

math
81.2%
general
67.3%
code
66.4%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

C-Eval

Rank #1 of 6
#1Kimi K2 Base
92.5%
#2DeepSeek-R1
91.8%
#3Kimi-k1.5
88.3%
#4DeepSeek-V3
86.5%

GSM8k

Rank #20 of 46
#17Nova Micro
92.3%
#18Claude 3 Sonnet
92.3%
#19Mistral Large 2
93.0%
#20Kimi K2 Base
92.1%
#21Qwen2.5 7B Instruct
91.6%
#22Llama 3.1 Nemotron 70B Instruct
91.4%
#23Qwen2 72B Instruct
91.1%

MMLU-redux-2.0

Rank #1 of 1
#1Kimi K2 Base
90.2%

MMLU

Rank #13 of 78
#10Qwen3 235B A22B
87.8%
#11DeepSeek-V3
88.5%
#12GPT-4o
88.7%
#13Kimi K2 Base
87.8%
#14GPT-4.1 mini
87.5%
#15Grok-2
87.5%
#16Kimi-k1.5
87.4%

TriviaQA

Rank #1 of 13
#1Kimi K2 Base
85.1%
#2Gemma 2 27B
83.7%
#3Mistral Small 3.1 24B Base
80.5%
#4Mistral Small 3.1 24B Instruct
80.5%
All Benchmark Results for Kimi K2 Base
Complete list of benchmark scores with detailed information
C-Eval
C-Eval benchmark
code
text
0.93
92.5%
Self-reported
GSM8k
GSM8k benchmark
math
text
0.92
92.1%
Self-reported
MMLU-redux-2.0
MMLU-redux-2.0 benchmark
general
text
0.90
90.2%
Self-reported
MMLU
MMLU benchmark
general
text
0.88
87.8%
Self-reported
TriviaQA
TriviaQA benchmark
general
text
0.85
85.1%
Self-reported
EvalPlus
EvalPlus benchmark
code
text
80.30
80.3%
Self-reported
CSimpleQA
CSimpleQA benchmark
general
text
0.78
77.6%
Self-reported
MATH
MATH benchmark
math
text
0.70
70.2%
Self-reported
MMLU-Pro
MMLU-Pro benchmark
general
text
0.69
69.2%
Self-reported
GPQA
GPQA benchmark
general
text
0.48
48.1%
Self-reported
Showing 1 to 10 of 13 benchmarks