Moonshot AI

Kimi-k1.5

Multimodal
Zero-eval
#1LiveCodeBench v5 24.12-25.2
#2CLUEWSC
#3C-Eval
+1 more

by Moonshot AI

About

Kimi-k1.5 is a multimodal language model developed by Moonshot AI. This model demonstrates exceptional performance with an average score of 81.7% across 9 benchmarks. It excels particularly in MATH-500 (96.2%), CLUEWSC (91.4%), C-Eval (88.3%). The model shows particular specialization in math tasks with an average performance of 85.5%. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2025, it represents Moonshot AI's latest advancement in AI technology.

Timeline
AnnouncedJan 20, 2025
ReleasedJan 20, 2025
Specifications
Capabilities
Multimodal
License & Family
License
Proprietary
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

9 benchmarks
Average Score
81.7%
Best Score
96.2%
High Performers (80%+)
5

Top Categories

math
85.5%
general
85.4%
code
79.3%
vision
70.0%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

MATH-500

Rank #6 of 22
#3Claude 3.7 Sonnet
96.2%
#4Llama-3.3 Nemotron Super 49B v1
96.6%
#5Llama 3.1 Nemotron Ultra 253B v1
97.0%
#6Kimi-k1.5
96.2%
#7DeepSeek R1 Zero
95.9%
#8Llama 3.1 Nemotron Nano 8B V1
95.4%
#9Phi 4 Mini Reasoning
94.6%

CLUEWSC

Rank #2 of 3
#1DeepSeek-R1
92.8%
#2Kimi-k1.5
91.4%
#3DeepSeek-V3
90.9%

C-Eval

Rank #3 of 6
#1DeepSeek-R1
91.8%
#2Kimi K2 Base
92.5%
#3Kimi-k1.5
88.3%
#4DeepSeek-V3
86.5%
#5Qwen2 72B Instruct
83.8%
#6Qwen2 7B Instruct
77.2%

MMLU

Rank #16 of 78
#13Grok-2
87.5%
#14GPT-4.1 mini
87.5%
#15Kimi K2 Base
87.8%
#16Kimi-k1.5
87.4%
#17Llama 3.1 405B Instruct
87.3%
#18o3-mini
86.9%
#19Claude 3 Opus
86.8%

IFEval

Rank #16 of 37
#13GPT-4.1
87.4%
#14Llama 3.1 70B Instruct
87.5%
#15GPT-4.5
88.2%
#16Kimi-k1.5
87.2%
#17Nova Micro
87.2%
#18DeepSeek-V3
86.1%
#19Phi 4 Reasoning Plus
84.9%
All Benchmark Results for Kimi-k1.5
Complete list of benchmark scores with detailed information
MATH-500
MATH-500 benchmark
math
text
0.96
96.2%
Self-reported
CLUEWSC
CLUEWSC benchmark
general
text
0.91
91.4%
Self-reported
C-Eval
C-Eval benchmark
code
text
0.88
88.3%
Self-reported
MMLU
MMLU benchmark
general
text
0.87
87.4%
Self-reported
IFEval
IFEval benchmark
code
text
0.87
87.2%
Self-reported
AIME 2024
AIME 2024 benchmark
general
text
0.78
77.5%
Self-reported
MathVista
MathVista benchmark
math
text
0.75
74.9%
Self-reported
MMMU
MMMU benchmark
vision
multimodal
0.70
70.0%
Self-reported
LiveCodeBench v5 24.12-25.2
LiveCodeBench v5 24.12-25.2 benchmark
code
text
0.63
62.5%
Self-reported