
IBM Granite 4.0 Tiny Preview
Zero-eval
#3AttaQ
#3PopQA
by IBM
About
IBM Granite 4.0 Tiny Preview is a language model developed by IBM. The model shows competitive results across 12 benchmarks. It excels particularly in AttaQ (86.1%), HumanEval (82.4%), HumanEval+ (78.3%). It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents IBM's latest advancement in AI technology.
Timeline
AnnouncedMay 2, 2025
ReleasedMay 2, 2025
Specifications
Training Tokens2.5T
License & Family
License
Apache 2.0
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
12 benchmarks
Average Score
57.1%
Best Score
86.1%
High Performers (80%+)
2Top Categories
math
70.1%
code
64.7%
factuality
58.1%
general
49.7%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
AttaQ
Rank #3 of 3
#1Granite 3.3 8B Base
88.5%
#2Granite 3.3 8B Instruct
88.5%
#3IBM Granite 4.0 Tiny Preview
86.1%
HumanEval
Rank #38 of 62
#35Phi 4
82.6%
#36Qwen2.5 14B Instruct
83.5%
#37Gemini 1.5 Pro
84.1%
#38IBM Granite 4.0 Tiny Preview
82.4%
#39Codestral-22B
81.1%
#40Nova Micro
81.1%
#41Llama 3.1 70B Instruct
80.5%
HumanEval+
Rank #6 of 8
#3Phi 4
82.8%
#4Granite 3.3 8B Base
86.1%
#5Granite 3.3 8B Instruct
86.1%
#6IBM Granite 4.0 Tiny Preview
78.3%
#7Qwen2.5 32B Instruct
52.4%
#8Qwen2.5 14B Instruct
51.2%
GSM8k
Rank #43 of 46
#40Command R+
70.7%
#41Gemma 2 27B
74.0%
#42Jamba 1.5 Mini
75.8%
#43IBM Granite 4.0 Tiny Preview
70.1%
#44Gemma 2 9B
68.6%
#45Gemma 3 1B
62.8%
#46Granite 3.3 8B Base
59.0%
IFEval
Rank #35 of 37
#32Qwen2.5 7B Instruct
71.2%
#33GPT-4.1 nano
74.5%
#34Granite 3.3 8B Base
74.8%
#35IBM Granite 4.0 Tiny Preview
63.0%
#36Phi 4
63.0%
#37Pixtral-12B
61.3%
All Benchmark Results for IBM Granite 4.0 Tiny Preview
Complete list of benchmark scores with detailed information
AttaQ AttaQ benchmark | general | text | 0.86 | 86.1% | Self-reported |
HumanEval HumanEval benchmark | code | text | 0.82 | 82.4% | Self-reported |
HumanEval+ HumanEval+ benchmark | code | text | 0.78 | 78.3% | Self-reported |
GSM8k GSM8k benchmark | math | text | 0.70 | 70.1% | Self-reported |
IFEval IFEval benchmark | code | text | 0.63 | 63.0% | Self-reported |
MMLU MMLU benchmark | general | text | 0.60 | 60.4% | Self-reported |
TruthfulQA TruthfulQA benchmark | factuality | text | 0.58 | 58.1% | Self-reported |
BIG-Bench Hard BIG-Bench Hard benchmark | general | text | 0.56 | 55.7% | Self-reported |
DROP DROP benchmark | general | text | 0.46 | 46.2% | Self-reported |
AlpacaEval 2.0 AlpacaEval 2.0 benchmark | code | text | 0.35 | 35.2% | Self-reported |
Showing 1 to 10 of 12 benchmarks