DeepInfra

Major Platform

deepinfra.com

Platform Stats

Total Models26

Organizations6

Verified Benchmarks0

Multimodal Models8

Pricing Overview

Avg Input (per 1M)$0.26

Avg Output (per 1M)$0.53

Cheapest Model$0.01

Premium Model$1.79

Supported Features

Number of models supporting each feature

web Search

function Calling

structured Output

code Execution

batch Inference

finetuning

Input Modalities

Models supporting different input types

text

26 (100%)

image

8 (31%)

audio

0 (0%)

video

0 (0%)

Models Overview

Top performers and pricing distribution

Pricing Distribution

Input pricing per 1M tokens

$0-1

25 models

$1-5

1 models

Top Performing Models

By benchmark avg

#1Llama 3.3 70B Instruct

79.9%

#2Llama 3.1 405B Instruct

79.2%

#3Qwen2.5 72B Instruct

77.4%

#4Qwen3 235B A22B

76.2%

#5DeepSeek R1 Distill Llama 70B

76.0%

Most Affordable Models

Llama 3.2 3B Instruct

$0.01/1M

Gemma 3 4B

$0.02/1M

Gemma 3 12B

$0.05/1M

Available Models

26 models available through DeepInfra

			License
DeepSeek-R1-0528 An upgraded version of DeepSeek R1 with significantly improved reasoning capabilities. This model leverages increased computational resources and algorithmic optimization mechanisms during post-training, demonstrating outstanding performance across mathematics, programming, and general logic tasks.	DeepSeek	May 28, 2025	MIT License	57.6%	71.6%	-	73.3%	-
DeepSeek-R1 DeepSeek-R1 is the first-generation reasoning model built atop DeepSeek-V3 (671B total parameters, 37B activated per token). It incorporates large-scale reinforcement learning (RL) to enhance its chain-of-thought and reasoning capabilities, delivering strong performance in math, code, and multi-step reasoning tasks.	DeepSeek	Jan 20, 2025	MIT License	49.2%	53.3%	-	65.9%	-
DeepSeek-V2.5 DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct, integrating general and coding abilities. It better aligns with human preferences and has been optimized in various aspects, including writing and instruction following.	DeepSeek	May 8, 2024	deepseek	16.8%	-	89.0%	-	-
Llama 3.2 3B Instruct Llama 3.2 3B Instruct is a large language model that supports a context length of 128K tokens and are state-of-the-art in their class for on-device use cases like summarization, instruction following, and rewriting tasks running locally at the edge.	Meta	Sep 25, 2024	Llama 3.2 Community License	-	-	-	-	-
Gemma 3 4B Gemma 3 4B is a 4-billion-parameter vision-language model from Google, handling text and image input and generating text output. It features a 128K context window, multilingual support, and open weights. Suitable for question answering, summarization, reasoning, and image understanding tasks.	Google	Mar 12, 2025	Gemma	-	-	71.3%	12.6%	63.2%
Gemma 3 12B Gemma 3 12B is a 12-billion-parameter vision-language model from Google, handling text and image input and generating text output. It features a 128K context window, multilingual support, and open weights. Suitable for question answering, summarization, reasoning, and image understanding tasks.	Google	Mar 12, 2025	Gemma	-	-	85.4%	24.6%	73.0%
Phi-4-multimodal-instruct Phi-4-multimodal-instruct is a lightweight (5.57B parameters) open multimodal foundation model that leverages research and datasets from Phi-3.5 and 4.0. It processes text, image, and audio inputs to generate text outputs, supporting a 128K token context length. Enhanced via SFT, DPO, and RLHF for instruction following and safety.	Microsoft	Feb 1, 2025	MIT	-	-	-	-	-
Llama 3.2 11B Instruct Llama 3.2 11B Vision Instruct is an instruction-tuned multimodal large language model optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. It accepts text and images as input and generates text as output.	Meta	Sep 25, 2024	Llama 3.2 Community License	-	-	-	-	-
Llama 3.1 8B Instruct Llama 3.1 8B Instruct is a multilingual large language model optimized for dialogue use cases. It features a 128K context length, state-of-the-art tool use, and strong reasoning capabilities.	Meta	Jul 23, 2024	Llama 3.1 Community License	-	-	72.6%	-	-
Mistral Small 3 24B Instruct Mistral Small 3 is a 24B-parameter LLM licensed under Apache-2.0. It focuses on low-latency, high-efficiency instruction following, maintaining performance comparable to larger models. It provides quick, accurate responses for conversational agents, function calling, and domain-specific fine-tuning. Suitable for local inference when quantized, it rivals models 2–3× its size while using significantly fewer compute resources.	Mistral AI	Jan 30, 2025	Apache 2.0	-	-	84.8%	-	-

Showing 1 to 10 of 26 models

Resources

Official Website