
Kimi-k1.5
Multimodal
Zero-eval
#1LiveCodeBench v5 24.12-25.2
#2CLUEWSC
#3C-Eval
+1 more
by Moonshot AI
About
Kimi-k1.5 is a multimodal language model developed by Moonshot AI. This model demonstrates exceptional performance with an average score of 81.7% across 9 benchmarks. It excels particularly in MATH-500 (96.2%), CLUEWSC (91.4%), C-Eval (88.3%). The model shows particular specialization in math tasks with an average performance of 85.5%. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2025, it represents Moonshot AI's latest advancement in AI technology.
Timeline
AnnouncedJan 20, 2025
ReleasedJan 20, 2025
Specifications
Capabilities
Multimodal
License & Family
License
Proprietary
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
9 benchmarks
Average Score
81.7%
Best Score
96.2%
High Performers (80%+)
5Top Categories
math
85.5%
general
85.4%
code
79.3%
vision
70.0%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
MATH-500
Rank #6 of 22
#3Claude 3.7 Sonnet
96.2%
#4Llama-3.3 Nemotron Super 49B v1
96.6%
#5Llama 3.1 Nemotron Ultra 253B v1
97.0%
#6Kimi-k1.5
96.2%
#7DeepSeek R1 Zero
95.9%
#8Llama 3.1 Nemotron Nano 8B V1
95.4%
#9Phi 4 Mini Reasoning
94.6%
CLUEWSC
Rank #2 of 3
#1DeepSeek-R1
92.8%
#2Kimi-k1.5
91.4%
#3DeepSeek-V3
90.9%
C-Eval
Rank #3 of 6
#1DeepSeek-R1
91.8%
#2Kimi K2 Base
92.5%
#3Kimi-k1.5
88.3%
#4DeepSeek-V3
86.5%
#5Qwen2 72B Instruct
83.8%
#6Qwen2 7B Instruct
77.2%
MMLU
Rank #16 of 78
#13Grok-2
87.5%
#14GPT-4.1 mini
87.5%
#15Kimi K2 Base
87.8%
#16Kimi-k1.5
87.4%
#17Llama 3.1 405B Instruct
87.3%
#18o3-mini
86.9%
#19Claude 3 Opus
86.8%
IFEval
Rank #16 of 37
#13GPT-4.1
87.4%
#14Llama 3.1 70B Instruct
87.5%
#15GPT-4.5
88.2%
#16Kimi-k1.5
87.2%
#17Nova Micro
87.2%
#18DeepSeek-V3
86.1%
#19Phi 4 Reasoning Plus
84.9%
All Benchmark Results for Kimi-k1.5
Complete list of benchmark scores with detailed information
MATH-500 MATH-500 benchmark | math | text | 0.96 | 96.2% | Self-reported |
CLUEWSC CLUEWSC benchmark | general | text | 0.91 | 91.4% | Self-reported |
C-Eval C-Eval benchmark | code | text | 0.88 | 88.3% | Self-reported |
MMLU MMLU benchmark | general | text | 0.87 | 87.4% | Self-reported |
IFEval IFEval benchmark | code | text | 0.87 | 87.2% | Self-reported |
AIME 2024 AIME 2024 benchmark | general | text | 0.78 | 77.5% | Self-reported |
MathVista MathVista benchmark | math | text | 0.75 | 74.9% | Self-reported |
MMMU MMMU benchmark | vision | multimodal | 0.70 | 70.0% | Self-reported |
LiveCodeBench v5 24.12-25.2 LiveCodeBench v5 24.12-25.2 benchmark | code | text | 0.63 | 62.5% | Self-reported |