
Gemini Diffusion
Zero-eval
#1LBPP (v2)
#1BigCodeBench
#3BIG-Bench Extra Hard
by Google
About
Gemini Diffusion is a language model developed by Google. The model shows competitive results across 10 benchmarks. It excels particularly in HumanEval (89.6%), MBPP (76.0%), Global-MMLU-Lite (69.1%). Released in 2025, it represents Google's latest advancement in AI technology.
Timeline
AnnouncedMay 20, 2025
ReleasedMay 20, 2025
Specifications
License & Family
License
Proprietary
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
10 benchmarks
Average Score
46.9%
Best Score
89.6%
High Performers (80%+)
1Top Categories
code
60.5%
general
37.9%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
HumanEval
Rank #12 of 62
#9Granite 3.3 8B Base
89.7%
#10Granite 3.3 8B Instruct
89.7%
#11GPT-4o
90.2%
#12Gemini Diffusion
89.6%
#13Llama 3.1 405B Instruct
89.0%
#14Nova Pro
89.0%
#15DeepSeek-V2.5
89.0%
MBPP
Rank #15 of 31
#12Llama 4 Maverick
77.6%
#13Codestral-22B
78.2%
#14Qwen2.5 7B Instruct
79.2%
#15Gemini Diffusion
76.0%
#16Mistral Small 3.1 24B Instruct
74.7%
#17Gemma 3 27B
74.4%
#18Qwen2.5-Omni-7B
73.2%
Global-MMLU-Lite
Rank #8 of 14
#5Gemma 3 12B
69.5%
#6Gemma 3 27B
75.1%
#7Gemini 2.0 Flash-Lite
78.2%
#8Gemini Diffusion
69.1%
#9Gemma 3n E4B Instructed
64.5%
#10Gemma 3n E4B Instructed LiteRT Preview
64.5%
#11Gemma 3n E2B Instructed LiteRT (Preview)
59.0%
LBPP (v2)
Rank #1 of 1
#1Gemini Diffusion
56.8%
BigCodeBench
Rank #1 of 2
#1Gemini Diffusion
45.4%
#2Qwen2.5-Coder 7B Instruct
41.0%
All Benchmark Results for Gemini Diffusion
Complete list of benchmark scores with detailed information
HumanEval HumanEval benchmark | code | text | 0.90 | 89.6% | Self-reported |
MBPP MBPP benchmark | code | text | 76.00 | 76.0% | Self-reported |
Global-MMLU-Lite Global-MMLU-Lite benchmark | general | text | 0.69 | 69.1% | Self-reported |
LBPP (v2) LBPP (v2) benchmark | general | text | 0.57 | 56.8% | Self-reported |
BigCodeBench BigCodeBench benchmark | code | text | 0.45 | 45.4% | Self-reported |
GPQA GPQA benchmark | general | text | 0.40 | 40.4% | Self-reported |
LiveCodeBench LiveCodeBench benchmark | code | text | 0.31 | 30.9% | Self-reported |
AIME 2025 AIME 2025 benchmark | general | text | 0.23 | 23.3% | Self-reported |
SWE-Bench Verified SWE-Bench Verified benchmark | general | text | 0.23 | 22.9% | Self-reported |
BIG-Bench Extra Hard BIG-Bench Extra Hard benchmark | general | text | 0.15 | 15.0% | Self-reported |
Resources