Granite 3.3 8B Instruct
Multimodal
Zero-eval
#1AttaQ
#1PopQA
#2TruthfulQA
+2 more
by IBM
About
Granite 3.3 8B Instruct is a multimodal language model developed by IBM. It achieves strong performance with an average score of 69.8% across 14 benchmarks. It excels particularly in HumanEval (89.7%), AttaQ (88.5%), HumanEval+ (86.1%). The model shows particular specialization in code tasks with an average performance of 78.3%. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents IBM's latest advancement in AI technology.
Timeline
AnnouncedApr 16, 2025
ReleasedApr 16, 2025
Knowledge CutoffApr 1, 2024
Specifications
Capabilities
Multimodal
License & Family
License
Apache 2.0
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
14 benchmarks
Average Score
69.8%
Best Score
89.7%
High Performers (80%+)
5Top Categories
code
78.3%
math
75.0%
factuality
66.9%
general
63.9%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
HumanEval
Rank #10 of 62
#7GPT-4o
90.2%
#8Qwen2.5 VL 32B Instruct
91.5%
#9Mistral Large 2
92.0%
#10Granite 3.3 8B Instruct
89.7%
#11Granite 3.3 8B Base
89.7%
#12Gemini Diffusion
89.6%
#13Llama 3.1 405B Instruct
89.0%
AttaQ
Rank #1 of 3
#1Granite 3.3 8B Instruct
88.5%
#2Granite 3.3 8B Base
88.5%
#3IBM Granite 4.0 Tiny Preview
86.1%
HumanEval+
Rank #3 of 8
#1Phi 4 Reasoning Plus
92.3%
#2Phi 4 Reasoning
92.9%
#3Granite 3.3 8B Instruct
86.1%
#4Granite 3.3 8B Base
86.1%
#5Phi 4
82.8%
#6IBM Granite 4.0 Tiny Preview
78.3%
AIME 2024
Rank #17 of 41
#14Phi 4 Reasoning Plus
81.3%
#15Qwen3 32B
81.4%
#16DeepSeek R1 Distill Qwen 32B
83.3%
#17Granite 3.3 8B Instruct
81.2%
#18Granite 3.3 8B Base
81.2%
#19Qwen3 30B A3B
80.4%
#20DeepSeek R1 Distill Qwen 14B
80.0%
GSM8k
Rank #37 of 46
#34Qwen2 7B Instruct
82.3%
#35Qwen2.5-Coder 7B Instruct
83.9%
#36Gemini 1.5 Flash
86.2%
#37Granite 3.3 8B Instruct
80.9%
#38Mistral Small 3 24B Base
80.7%
#39Llama 3.2 3B Instruct
77.7%
#40Jamba 1.5 Mini
75.8%
All Benchmark Results for Granite 3.3 8B Instruct
Complete list of benchmark scores with detailed information
HumanEval HumanEval benchmark | code | text | 0.90 | 89.7% | Self-reported |
AttaQ AttaQ benchmark | general | text | 0.89 | 88.5% | Self-reported |
HumanEval+ HumanEval+ benchmark | code | text | 0.86 | 86.1% | Self-reported |
AIME 2024 AIME 2024 benchmark | general | text | 0.81 | 81.2% | Self-reported |
GSM8k GSM8k benchmark | math | text | 0.81 | 80.9% | Self-reported |
IFEval IFEval benchmark | code | text | 0.75 | 74.8% | Self-reported |
BIG-Bench Hard BIG-Bench Hard benchmark | general | text | 0.69 | 69.1% | Self-reported |
MATH-500 MATH-500 benchmark | math | text | 0.69 | 69.0% | Self-reported |
TruthfulQA TruthfulQA benchmark | factuality | text | 0.67 | 66.9% | Self-reported |
MMLU MMLU benchmark | general | text | 0.66 | 65.5% | Self-reported |
Showing 1 to 10 of 14 benchmarks