
Granite 3.3 8B Base
Multimodal
Zero-eval
#1NQ
#2AttaQ
#2PopQA
+1 more
by IBM
About
Granite 3.3 8B Base is a multimodal language model developed by IBM. It achieves strong performance with an average score of 64.3% across 20 benchmarks. It excels particularly in HumanEval (89.7%), AttaQ (88.5%), HumanEval+ (86.1%). As a multimodal model, it can process and understand text, images, and other input formats seamlessly. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents IBM's latest advancement in AI technology.
Timeline
AnnouncedApr 16, 2025
ReleasedApr 16, 2025
Knowledge CutoffApr 1, 2024
Specifications
Capabilities
Multimodal
License & Family
License
Apache 2.0
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
20 benchmarks
Average Score
64.3%
Best Score
89.7%
High Performers (80%+)
5Top Categories
code
72.5%
reasoning
68.4%
math
64.0%
general
59.7%
factuality
52.1%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
HumanEval
Rank #11 of 62
#8Granite 3.3 8B Instruct
89.7%
#9GPT-4o
90.2%
#10Qwen2.5 VL 32B Instruct
91.5%
#11Granite 3.3 8B Base
89.7%
#12Gemini Diffusion
89.6%
#13Llama 3.1 405B Instruct
89.0%
#14Nova Pro
89.0%
AttaQ
Rank #2 of 3
#1Granite 3.3 8B Instruct
88.5%
#2Granite 3.3 8B Base
88.5%
#3IBM Granite 4.0 Tiny Preview
86.1%
HumanEval+
Rank #4 of 8
#1Granite 3.3 8B Instruct
86.1%
#2Phi 4 Reasoning Plus
92.3%
#3Phi 4 Reasoning
92.9%
#4Granite 3.3 8B Base
86.1%
#5Phi 4
82.8%
#6IBM Granite 4.0 Tiny Preview
78.3%
#7Qwen2.5 32B Instruct
52.4%
AIME 2024
Rank #18 of 41
#15Granite 3.3 8B Instruct
81.2%
#16Phi 4 Reasoning Plus
81.3%
#17Qwen3 32B
81.4%
#18Granite 3.3 8B Base
81.2%
#19Qwen3 30B A3B
80.4%
#20DeepSeek R1 Distill Qwen 14B
80.0%
#21DeepSeek R1 Distill Llama 8B
80.0%
HellaSwag
Rank #16 of 24
#13Gemma 2 9B
81.9%
#14Qwen2.5-Coder 32B Instruct
83.0%
#15Mistral NeMo Instruct
83.5%
#16Granite 3.3 8B Base
80.1%
#17Gemma 3n E4B Instructed LiteRT Preview
78.6%
#18Gemma 3n E4B
78.6%
#19Qwen2.5-Coder 7B Instruct
76.8%
All Benchmark Results for Granite 3.3 8B Base
Complete list of benchmark scores with detailed information
HumanEval HumanEval benchmark | code | text | 0.90 | 89.7% | Self-reported |
AttaQ AttaQ benchmark | general | text | 0.89 | 88.5% | Self-reported |
HumanEval+ HumanEval+ benchmark | code | text | 0.86 | 86.1% | Self-reported |
AIME 2024 AIME 2024 benchmark | general | text | 0.81 | 81.2% | Self-reported |
HellaSwag HellaSwag benchmark | reasoning | text | 0.80 | 80.1% | Self-reported |
TriviaQA TriviaQA benchmark | general | text | 0.78 | 78.2% | Self-reported |
IFEval IFEval benchmark | code | text | 0.75 | 74.8% | Self-reported |
Winogrande Winogrande benchmark | reasoning | text | 0.74 | 74.4% | Self-reported |
BIG-Bench Hard BIG-Bench Hard benchmark | general | text | 0.69 | 69.1% | Self-reported |
MATH-500 MATH-500 benchmark | math | text | 0.69 | 69.0% | Self-reported |
Showing 1 to 10 of 20 benchmarks