IBM

IBM Granite 4.0 Tiny Preview

Zero-eval
#3AttaQ
#3PopQA

by IBM

About

IBM Granite 4.0 Tiny Preview is a language model developed by IBM. The model shows competitive results across 12 benchmarks. It excels particularly in AttaQ (86.1%), HumanEval (82.4%), HumanEval+ (78.3%). It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents IBM's latest advancement in AI technology.

Timeline
AnnouncedMay 2, 2025
ReleasedMay 2, 2025
Specifications
Training Tokens2.5T
License & Family
License
Apache 2.0
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

12 benchmarks
Average Score
57.1%
Best Score
86.1%
High Performers (80%+)
2

Top Categories

math
70.1%
code
64.7%
factuality
58.1%
general
49.7%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

AttaQ

Rank #3 of 3
#1Granite 3.3 8B Base
88.5%
#2Granite 3.3 8B Instruct
88.5%
#3IBM Granite 4.0 Tiny Preview
86.1%

HumanEval

Rank #38 of 62
#35Phi 4
82.6%
#36Qwen2.5 14B Instruct
83.5%
#37Gemini 1.5 Pro
84.1%
#38IBM Granite 4.0 Tiny Preview
82.4%
#39Codestral-22B
81.1%
#40Nova Micro
81.1%
#41Llama 3.1 70B Instruct
80.5%

HumanEval+

Rank #6 of 8
#3Phi 4
82.8%
#4Granite 3.3 8B Base
86.1%
#5Granite 3.3 8B Instruct
86.1%
#6IBM Granite 4.0 Tiny Preview
78.3%
#7Qwen2.5 32B Instruct
52.4%
#8Qwen2.5 14B Instruct
51.2%

GSM8k

Rank #43 of 46
#40Command R+
70.7%
#41Gemma 2 27B
74.0%
#42Jamba 1.5 Mini
75.8%
#43IBM Granite 4.0 Tiny Preview
70.1%
#44Gemma 2 9B
68.6%
#45Gemma 3 1B
62.8%
#46Granite 3.3 8B Base
59.0%

IFEval

Rank #35 of 37
#32Qwen2.5 7B Instruct
71.2%
#33GPT-4.1 nano
74.5%
#34Granite 3.3 8B Base
74.8%
#35IBM Granite 4.0 Tiny Preview
63.0%
#36Phi 4
63.0%
#37Pixtral-12B
61.3%
All Benchmark Results for IBM Granite 4.0 Tiny Preview
Complete list of benchmark scores with detailed information
AttaQ
AttaQ benchmark
general
text
0.86
86.1%
Self-reported
HumanEval
HumanEval benchmark
code
text
0.82
82.4%
Self-reported
HumanEval+
HumanEval+ benchmark
code
text
0.78
78.3%
Self-reported
GSM8k
GSM8k benchmark
math
text
0.70
70.1%
Self-reported
IFEval
IFEval benchmark
code
text
0.63
63.0%
Self-reported
MMLU
MMLU benchmark
general
text
0.60
60.4%
Self-reported
TruthfulQA
TruthfulQA benchmark
factuality
text
0.58
58.1%
Self-reported
BIG-Bench Hard
BIG-Bench Hard benchmark
general
text
0.56
55.7%
Self-reported
DROP
DROP benchmark
general
text
0.46
46.2%
Self-reported
AlpacaEval 2.0
AlpacaEval 2.0 benchmark
code
text
0.35
35.2%
Self-reported
Showing 1 to 10 of 12 benchmarks