IBM Granite 4.0 Tiny Preview

Zero-eval

#3AttaQ

#3PopQA

by IBM

About

IBM Granite 4.0 Tiny Preview is a language model developed by IBM. The model shows competitive results across 12 benchmarks. It excels particularly in AttaQ (86.1%), HumanEval (82.4%), HumanEval+ (78.3%). It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents IBM's latest advancement in AI technology.

Timeline

AnnouncedMay 2, 2025

ReleasedMay 2, 2025

Specifications

Training Tokens2.5T

License & Family

License

Apache 2.0

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

12 benchmarks

Average Score

57.1%

Best Score

86.1%

High Performers (80%+)

Top Categories

math

70.1%

code

64.7%

factuality

58.1%

general

49.7%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

AttaQ

Rank #3 of 3

#1Granite 3.3 8B Base

88.5%

#2Granite 3.3 8B Instruct

88.5%

#3IBM Granite 4.0 Tiny Preview

86.1%

HumanEval

Rank #38 of 62

#35Phi 4

82.6%

#36Qwen2.5 14B Instruct

83.5%

#37Gemini 1.5 Pro

84.1%

#38IBM Granite 4.0 Tiny Preview

82.4%

#39Codestral-22B

81.1%

#40Nova Micro

81.1%

#41Llama 3.1 70B Instruct

80.5%

HumanEval+

Rank #6 of 8

#3Phi 4

82.8%

#4Granite 3.3 8B Base

86.1%

#5Granite 3.3 8B Instruct

86.1%

#6IBM Granite 4.0 Tiny Preview

78.3%

#7Qwen2.5 32B Instruct

52.4%

#8Qwen2.5 14B Instruct

51.2%

GSM8k

Rank #43 of 46

#40Command R+

70.7%

#41Gemma 2 27B

74.0%

#42Jamba 1.5 Mini

75.8%

#43IBM Granite 4.0 Tiny Preview

70.1%

#44Gemma 2 9B

68.6%

#45Gemma 3 1B

62.8%

#46Granite 3.3 8B Base

59.0%

IFEval

Rank #35 of 37

#32Qwen2.5 7B Instruct

71.2%

#33GPT-4.1 nano

74.5%

#34Granite 3.3 8B Base

74.8%

#35IBM Granite 4.0 Tiny Preview

63.0%

#36Phi 4

63.0%

#37Pixtral-12B

61.3%

All Benchmark Results for IBM Granite 4.0 Tiny Preview

Complete list of benchmark scores with detailed information


AttaQ AttaQ benchmark	general	text	0.86	86.1%	Self-reported
HumanEval HumanEval benchmark	code	text	0.82	82.4%	Self-reported
HumanEval+ HumanEval+ benchmark	code	text	0.78	78.3%	Self-reported
GSM8k GSM8k benchmark	math	text	0.70	70.1%	Self-reported
IFEval IFEval benchmark	code	text	0.63	63.0%	Self-reported
MMLU MMLU benchmark	general	text	0.60	60.4%	Self-reported
TruthfulQA TruthfulQA benchmark	factuality	text	0.58	58.1%	Self-reported
BIG-Bench Hard BIG-Bench Hard benchmark	general	text	0.56	55.7%	Self-reported
DROP DROP benchmark	general	text	0.46	46.2%	Self-reported
AlpacaEval 2.0 AlpacaEval 2.0 benchmark	code	text	0.35	35.2%	Self-reported

Showing 1 to 10 of 12 benchmarks

Resources

API Reference Playground Blog Post Model Weights