Google

Gemini Diffusion

Zero-eval
#1LBPP (v2)
#1BigCodeBench
#3BIG-Bench Extra Hard

by Google

About

Gemini Diffusion is a language model developed by Google. The model shows competitive results across 10 benchmarks. It excels particularly in HumanEval (89.6%), MBPP (76.0%), Global-MMLU-Lite (69.1%). Released in 2025, it represents Google's latest advancement in AI technology.

Timeline
AnnouncedMay 20, 2025
ReleasedMay 20, 2025
Specifications
License & Family
License
Proprietary
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

10 benchmarks
Average Score
46.9%
Best Score
89.6%
High Performers (80%+)
1

Top Categories

code
60.5%
general
37.9%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

HumanEval

Rank #12 of 62
#9Granite 3.3 8B Base
89.7%
#10Granite 3.3 8B Instruct
89.7%
#11GPT-4o
90.2%
#12Gemini Diffusion
89.6%
#13Llama 3.1 405B Instruct
89.0%
#14Nova Pro
89.0%
#15DeepSeek-V2.5
89.0%

MBPP

Rank #15 of 31
#12Llama 4 Maverick
77.6%
#13Codestral-22B
78.2%
#14Qwen2.5 7B Instruct
79.2%
#15Gemini Diffusion
76.0%
#16Mistral Small 3.1 24B Instruct
74.7%
#17Gemma 3 27B
74.4%
#18Qwen2.5-Omni-7B
73.2%

Global-MMLU-Lite

Rank #8 of 14
#5Gemma 3 12B
69.5%
#6Gemma 3 27B
75.1%
#7Gemini 2.0 Flash-Lite
78.2%
#8Gemini Diffusion
69.1%
#9Gemma 3n E4B Instructed
64.5%
#10Gemma 3n E4B Instructed LiteRT Preview
64.5%
#11Gemma 3n E2B Instructed LiteRT (Preview)
59.0%

LBPP (v2)

Rank #1 of 1
#1Gemini Diffusion
56.8%

BigCodeBench

Rank #1 of 2
#1Gemini Diffusion
45.4%
#2Qwen2.5-Coder 7B Instruct
41.0%
All Benchmark Results for Gemini Diffusion
Complete list of benchmark scores with detailed information
HumanEval
HumanEval benchmark
code
text
0.90
89.6%
Self-reported
MBPP
MBPP benchmark
code
text
76.00
76.0%
Self-reported
Global-MMLU-Lite
Global-MMLU-Lite benchmark
general
text
0.69
69.1%
Self-reported
LBPP (v2)
LBPP (v2) benchmark
general
text
0.57
56.8%
Self-reported
BigCodeBench
BigCodeBench benchmark
code
text
0.45
45.4%
Self-reported
GPQA
GPQA benchmark
general
text
0.40
40.4%
Self-reported
LiveCodeBench
LiveCodeBench benchmark
code
text
0.31
30.9%
Self-reported
AIME 2025
AIME 2025 benchmark
general
text
0.23
23.3%
Self-reported
SWE-Bench Verified
SWE-Bench Verified benchmark
general
text
0.23
22.9%
Self-reported
BIG-Bench Extra Hard
BIG-Bench Extra Hard benchmark
general
text
0.15
15.0%
Self-reported