IBM

Granite 3.3 8B Base

Multimodal
Zero-eval
#1NQ
#2AttaQ
#2PopQA
+1 more

by IBM

About

Granite 3.3 8B Base is a multimodal language model developed by IBM. It achieves strong performance with an average score of 64.3% across 20 benchmarks. It excels particularly in HumanEval (89.7%), AttaQ (88.5%), HumanEval+ (86.1%). As a multimodal model, it can process and understand text, images, and other input formats seamlessly. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents IBM's latest advancement in AI technology.

Timeline
AnnouncedApr 16, 2025
ReleasedApr 16, 2025
Knowledge CutoffApr 1, 2024
Specifications
Capabilities
Multimodal
License & Family
License
Apache 2.0
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

20 benchmarks
Average Score
64.3%
Best Score
89.7%
High Performers (80%+)
5

Top Categories

code
72.5%
reasoning
68.4%
math
64.0%
general
59.7%
factuality
52.1%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

HumanEval

Rank #11 of 62
#8Granite 3.3 8B Instruct
89.7%
#9GPT-4o
90.2%
#10Qwen2.5 VL 32B Instruct
91.5%
#11Granite 3.3 8B Base
89.7%
#12Gemini Diffusion
89.6%
#13Llama 3.1 405B Instruct
89.0%
#14Nova Pro
89.0%

AttaQ

Rank #2 of 3
#1Granite 3.3 8B Instruct
88.5%
#2Granite 3.3 8B Base
88.5%
#3IBM Granite 4.0 Tiny Preview
86.1%

HumanEval+

Rank #4 of 8
#1Granite 3.3 8B Instruct
86.1%
#2Phi 4 Reasoning Plus
92.3%
#3Phi 4 Reasoning
92.9%
#4Granite 3.3 8B Base
86.1%
#5Phi 4
82.8%
#6IBM Granite 4.0 Tiny Preview
78.3%
#7Qwen2.5 32B Instruct
52.4%

AIME 2024

Rank #18 of 41
#15Granite 3.3 8B Instruct
81.2%
#16Phi 4 Reasoning Plus
81.3%
#17Qwen3 32B
81.4%
#18Granite 3.3 8B Base
81.2%
#19Qwen3 30B A3B
80.4%
#20DeepSeek R1 Distill Qwen 14B
80.0%
#21DeepSeek R1 Distill Llama 8B
80.0%

HellaSwag

Rank #16 of 24
#13Gemma 2 9B
81.9%
#14Qwen2.5-Coder 32B Instruct
83.0%
#15Mistral NeMo Instruct
83.5%
#16Granite 3.3 8B Base
80.1%
#17Gemma 3n E4B Instructed LiteRT Preview
78.6%
#18Gemma 3n E4B
78.6%
#19Qwen2.5-Coder 7B Instruct
76.8%
All Benchmark Results for Granite 3.3 8B Base
Complete list of benchmark scores with detailed information
HumanEval
HumanEval benchmark
code
text
0.90
89.7%
Self-reported
AttaQ
AttaQ benchmark
general
text
0.89
88.5%
Self-reported
HumanEval+
HumanEval+ benchmark
code
text
0.86
86.1%
Self-reported
AIME 2024
AIME 2024 benchmark
general
text
0.81
81.2%
Self-reported
HellaSwag
HellaSwag benchmark
reasoning
text
0.80
80.1%
Self-reported
TriviaQA
TriviaQA benchmark
general
text
0.78
78.2%
Self-reported
IFEval
IFEval benchmark
code
text
0.75
74.8%
Self-reported
Winogrande
Winogrande benchmark
reasoning
text
0.74
74.4%
Self-reported
BIG-Bench Hard
BIG-Bench Hard benchmark
general
text
0.69
69.1%
Self-reported
MATH-500
MATH-500 benchmark
math
text
0.69
69.0%
Self-reported
Showing 1 to 10 of 20 benchmarks