Granite 3.3 8B Base

Multimodal

Zero-eval

#1NQ

#2AttaQ

#2PopQA

+1 more

by IBM

About

Granite 3.3 8B Base is a multimodal language model developed by IBM. It achieves strong performance with an average score of 64.3% across 20 benchmarks. It excels particularly in HumanEval (89.7%), AttaQ (88.5%), HumanEval+ (86.1%). As a multimodal model, it can process and understand text, images, and other input formats seamlessly. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents IBM's latest advancement in AI technology.

Timeline

AnnouncedApr 16, 2025

ReleasedApr 16, 2025

Knowledge CutoffApr 1, 2024

Specifications

Capabilities

Multimodal

License & Family

License

Apache 2.0

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

20 benchmarks

Average Score

64.3%

Best Score

89.7%

High Performers (80%+)

Top Categories

code

72.5%

reasoning

68.4%

math

64.0%

general

59.7%

factuality

52.1%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

HumanEval

Rank #11 of 62

#8Granite 3.3 8B Instruct

89.7%

#9GPT-4o

90.2%

#10Qwen2.5 VL 32B Instruct

91.5%

#11Granite 3.3 8B Base

89.7%

#12Gemini Diffusion

89.6%

#13Llama 3.1 405B Instruct

89.0%

#14Nova Pro

89.0%

AttaQ

Rank #2 of 3

#1Granite 3.3 8B Instruct

88.5%

#2Granite 3.3 8B Base

88.5%

#3IBM Granite 4.0 Tiny Preview

86.1%

HumanEval+

Rank #4 of 8

#1Granite 3.3 8B Instruct

86.1%

#2Phi 4 Reasoning Plus

92.3%

#3Phi 4 Reasoning

92.9%

#4Granite 3.3 8B Base

86.1%

#5Phi 4

82.8%

#6IBM Granite 4.0 Tiny Preview

78.3%

#7Qwen2.5 32B Instruct

52.4%

AIME 2024

Rank #18 of 41

#15Granite 3.3 8B Instruct

81.2%

#16Phi 4 Reasoning Plus

81.3%

#17Qwen3 32B

81.4%

#18Granite 3.3 8B Base

81.2%

#19Qwen3 30B A3B

80.4%

#20DeepSeek R1 Distill Qwen 14B

80.0%

#21DeepSeek R1 Distill Llama 8B

80.0%

HellaSwag

Rank #16 of 24

#13Gemma 2 9B

81.9%

#14Qwen2.5-Coder 32B Instruct

83.0%

#15Mistral NeMo Instruct

83.5%

#16Granite 3.3 8B Base

80.1%

#17Gemma 3n E4B Instructed LiteRT Preview

78.6%

#18Gemma 3n E4B

78.6%

#19Qwen2.5-Coder 7B Instruct

76.8%

All Benchmark Results for Granite 3.3 8B Base

Complete list of benchmark scores with detailed information


HumanEval HumanEval benchmark	code	text	0.90	89.7%	Self-reported
AttaQ AttaQ benchmark	general	text	0.89	88.5%	Self-reported
HumanEval+ HumanEval+ benchmark	code	text	0.86	86.1%	Self-reported
AIME 2024 AIME 2024 benchmark	general	text	0.81	81.2%	Self-reported
HellaSwag HellaSwag benchmark	reasoning	text	0.80	80.1%	Self-reported
TriviaQA TriviaQA benchmark	general	text	0.78	78.2%	Self-reported
IFEval IFEval benchmark	code	text	0.75	74.8%	Self-reported
Winogrande Winogrande benchmark	reasoning	text	0.74	74.4%	Self-reported
BIG-Bench Hard BIG-Bench Hard benchmark	general	text	0.69	69.1%	Self-reported
MATH-500 MATH-500 benchmark	math	text	0.69	69.0%	Self-reported

Showing 1 to 10 of 20 benchmarks

Resources

API Reference Playground Blog Post Repository Model Weights