GPT-4.1

Name: GPT-4.1
Price: 2 USD
Rating: 57.3 (30 reviews)
Author: OpenAI

Multimodal

Zero-eval

#1Video-MME (long, no subtitles)

#1OpenAI-MRCR: 2 needle 1M

#1Graphwalks parents >128k

+5 more

by OpenAI

About

GPT-4.1 is a multimodal language model developed by OpenAI. The model shows competitive results across 30 benchmarks. It excels particularly in MMLU (90.2%), CharXiv-D (87.9%), IFEval (87.4%). With a 1.1M token context window, it can handle extensive documents and complex multi-turn conversations. The model is available through 1 API provider. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2025, it represents OpenAI's latest advancement in AI technology.

Pricing Range

Input (per 1M)$2.00 -$2.00

Output (per 1M)$8.00 -$8.00

Providers1

Timeline

AnnouncedApr 14, 2025

ReleasedApr 14, 2025

Knowledge CutoffJun 1, 2024

Specifications

Capabilities

Multimodal

License & Family

License

Proprietary

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

30 benchmarks

Average Score

57.3%

Best Score

90.2%

High Performers (80%+)

Performance Metrics

Max Context Window

1.1M

Avg Throughput

100.0 tok/s

Avg Latency

10ms

Top Categories

code

87.4%

vision

73.4%

math

72.2%

long_context

59.2%

agents

58.7%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

MMLU

Rank #8 of 78

#5Claude 3.5 Sonnet

90.4%

#6Claude 3.5 Sonnet

90.4%

#7DeepSeek-R1

90.8%

#8GPT-4.1

90.2%

#9Kimi K2 Instruct

89.5%

#10GPT-4o

88.7%

#11DeepSeek-V3

88.5%

CharXiv-D

Rank #3 of 5

#1GPT-4.1 mini

88.4%

#2GPT-4.5

90.0%

#3GPT-4.1

87.9%

#4GPT-4o

85.3%

#5GPT-4.1 nano

73.9%

IFEval

Rank #15 of 37

#12Llama 3.1 70B Instruct

87.5%

#13GPT-4.5

88.2%

#14Llama 3.1 405B Instruct

88.6%

#15GPT-4.1

87.4%

#16Kimi-k1.5

87.2%

#17Nova Micro

87.2%

#18DeepSeek-V3

86.1%

MMMLU

Rank #4 of 13

#1o1

87.7%

#2Claude Opus 4

88.8%

#3Claude Opus 4.1

98.4%

#4GPT-4.1

87.3%

#5Qwen3 235B A22B

86.7%

#6Claude Sonnet 4

86.5%

#7Claude 3.7 Sonnet

86.1%

MMMU

Rank #12 of 52

#9Claude 3.7 Sonnet

75.0%

#10GPT-4.5

75.2%

#11Gemini 2.0 Flash Thinking

75.4%

#12GPT-4.1

74.8%

#13Claude Sonnet 4

74.4%

#14Llama 4 Maverick

73.4%

#15Gemini 2.5 Flash-Lite

72.9%

All Benchmark Results for GPT-4.1

Complete list of benchmark scores with detailed information


MMLU MMLU benchmark	general	text	0.90	90.2%	Self-reported
CharXiv-D CharXiv-D benchmark	general	text	0.88	87.9%	Self-reported
IFEval IFEval benchmark	code	text	0.87	87.4%	Self-reported
MMMLU MMMLU benchmark	general	text	0.87	87.3%	Self-reported
MMMU MMMU benchmark	vision	multimodal	0.75	74.8%	Self-reported
MathVista MathVista benchmark	math	text	0.72	72.2%	Self-reported
Video-MME (long, no subtitles) Video-MME (long, no subtitles) benchmark	vision	video	0.72	72.0%	Self-reported
Multi-IF Multi-IF benchmark	general	text	0.71	70.8%	Self-reported
TAU-bench Retail TAU-bench Retail benchmark	agents	text	0.68	68.0%	Self-reported
GPQA GPQA benchmark	general	text	0.66	66.3%	Self-reported

Showing 1 to 10 of 30 benchmarks

Resources

API Reference Playground Blog Post