DeepInfra

Major Platform
deepinfra.com
Platform Stats
Total Models26
Organizations6
Verified Benchmarks0
Multimodal Models8
Pricing Overview
Avg Input (per 1M)$0.26
Avg Output (per 1M)$0.53
Cheapest Model$0.01
Premium Model$1.79
Supported Features
Number of models supporting each feature
web Search
0
function Calling
26
structured Output
26
code Execution
0
batch Inference
26
finetuning
0
Input Modalities
Models supporting different input types
text
26 (100%)
image
8 (31%)
audio
0 (0%)
video
0 (0%)
Models Overview
Top performers and pricing distribution

Pricing Distribution

Input pricing per 1M tokens
$0-1
25 models
$1-5
1 models

Top Performing Models

By benchmark avg
#1Llama 3.3 70B Instruct
79.9%
#2Llama 3.1 405B Instruct
79.2%
#3Qwen2.5 72B Instruct
77.4%
#4Qwen3 235B A22B
76.2%
#5DeepSeek R1 Distill Llama 70B
76.0%

Most Affordable Models

Llama 3.2 3B Instruct
$0.01/1M
Gemma 3 4B
$0.02/1M
Gemma 3 12B
$0.05/1M

Available Models

26 models available through DeepInfra

LicenseLinks
DeepSeekDeepSeek-R1-0528
An upgraded version of DeepSeek R1 with significantly improved reasoning capabilities. This model leverages increased computational resources and algorithmic optimization mechanisms during post-training, demonstrating outstanding performance across mathematics, programming, and general logic tasks.
May 28, 2025
MIT License
57.6%71.6%-73.3%-
DeepSeekDeepSeek-R1
DeepSeek-R1 is the first-generation reasoning model built atop DeepSeek-V3 (671B total parameters, 37B activated per token). It incorporates large-scale reinforcement learning (RL) to enhance its chain-of-thought and reasoning capabilities, delivering strong performance in math, code, and multi-step reasoning tasks.
Jan 20, 2025
MIT License
49.2%53.3%-65.9%-
DeepSeekDeepSeek-V2.5
DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct, integrating general and coding abilities. It better aligns with human preferences and has been optimized in various aspects, including writing and instruction following.
May 8, 2024
deepseek
16.8%-89.0%--
MetaLlama 3.2 3B Instruct
Llama 3.2 3B Instruct is a large language model that supports a context length of 128K tokens and are state-of-the-art in their class for on-device use cases like summarization, instruction following, and rewriting tasks running locally at the edge.
Sep 25, 2024
Llama 3.2 Community License
-----
GoogleGemma 3 4B
Gemma 3 4B is a 4-billion-parameter vision-language model from Google, handling text and image input and generating text output. It features a 128K context window, multilingual support, and open weights. Suitable for question answering, summarization, reasoning, and image understanding tasks.
Mar 12, 2025
Gemma
--71.3%12.6%63.2%
GoogleGemma 3 12B
Gemma 3 12B is a 12-billion-parameter vision-language model from Google, handling text and image input and generating text output. It features a 128K context window, multilingual support, and open weights. Suitable for question answering, summarization, reasoning, and image understanding tasks.
Mar 12, 2025
Gemma
--85.4%24.6%73.0%
MicrosoftPhi-4-multimodal-instruct
Phi-4-multimodal-instruct is a lightweight (5.57B parameters) open multimodal foundation model that leverages research and datasets from Phi-3.5 and 4.0. It processes text, image, and audio inputs to generate text outputs, supporting a 128K token context length. Enhanced via SFT, DPO, and RLHF for instruction following and safety.
Feb 1, 2025
MIT
-----
MetaLlama 3.2 11B Instruct
Llama 3.2 11B Vision Instruct is an instruction-tuned multimodal large language model optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. It accepts text and images as input and generates text as output.
Sep 25, 2024
Llama 3.2 Community License
-----
MetaLlama 3.1 8B Instruct
Llama 3.1 8B Instruct is a multilingual large language model optimized for dialogue use cases. It features a 128K context length, state-of-the-art tool use, and strong reasoning capabilities.
Jul 23, 2024
Llama 3.1 Community License
--72.6%--
Mistral AIMistral Small 3 24B Instruct
Mistral Small 3 is a 24B-parameter LLM licensed under Apache-2.0. It focuses on low-latency, high-efficiency instruction following, maintaining performance comparable to larger models. It provides quick, accurate responses for conversational agents, function calling, and domain-specific fine-tuning. Suitable for local inference when quantized, it rivals models 2–3× its size while using significantly fewer compute resources.
Jan 30, 2025
Apache 2.0
--84.8%--
Showing 1 to 10 of 26 models