Fireworks

fireworks.ai

Platform Stats

Total Models14

Organizations3

Verified Benchmarks0

Multimodal Models4

Pricing Overview

Avg Input (per 1M)$1.29

Avg Output (per 1M)$1.37

Cheapest Model$0.10

Premium Model$8.00

Supported Features

Number of models supporting each feature

web Search

function Calling

structured Output

code Execution

batch Inference

finetuning

Input Modalities

Models supporting different input types

text

14 (100%)

image

4 (29%)

audio

0 (0%)

video

0 (0%)

Models Overview

Top performers and pricing distribution

Pricing Distribution

Input pricing per 1M tokens

$0-1

12 models

$1-5

1 models

$5-15

1 models

Top Performing Models

By benchmark avg

#1Llama 3.3 70B Instruct

79.9%

#2Llama 3.1 405B Instruct

79.2%

#3Qwen2.5 72B Instruct

77.4%

#4Qwen3 235B A22B

76.2%

#5Llama 3.1 70B Instruct

74.7%

Most Affordable Models

Qwen3 235B A22B

$0.10/1M

Llama 4 Scout

$0.15/1M

Llama 3.2 11B Instruct

$0.20/1M

Available Models

14 models available through Fireworks

			License
DeepSeek-R1 DeepSeek-R1 is the first-generation reasoning model built atop DeepSeek-V3 (671B total parameters, 37B activated per token). It incorporates large-scale reinforcement learning (RL) to enhance its chain-of-thought and reasoning capabilities, delivering strong performance in math, code, and multi-step reasoning tasks.	DeepSeek	Jan 20, 2025	MIT License	49.2%	53.3%	-	65.9%	-
Qwen3 235B A22B Qwen3 235B A22B is a large language model developed by Alibaba, featuring a Mixture-of-Experts (MoE) architecture with 235 billion total parameters and 22 billion activated parameters. It achieves competitive results in benchmark evaluations of coding, math, general capabilities, and more, compared to other top-tier models.	Alibaba	Apr 29, 2025	Apache 2.0	-	-	-	70.7%	81.4%
Llama 4 Scout Llama 4 Scout is a natively multimodal model capable of processing both text and images. It features a 17 billion activated parameter (109B total) mixture-of-experts (MoE) architecture with 16 experts, supporting a wide range of multimodal tasks such as conversational interaction, image analysis, and code generation. The model includes a 10 million token context window.	Meta	Apr 5, 2025	Llama 4 Community License Agreement	-	-	-	32.8%	67.8%
Llama 3.2 11B Instruct Llama 3.2 11B Vision Instruct is an instruction-tuned multimodal large language model optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. It accepts text and images as input and generates text as output.	Meta	Sep 25, 2024	Llama 3.2 Community License	-	-	-	-	-
Llama 3.1 8B Instruct Llama 3.1 8B Instruct is a multilingual large language model optimized for dialogue use cases. It features a 128K context length, state-of-the-art tool use, and strong reasoning capabilities.	Meta	Jul 23, 2024	Llama 3.1 Community License	-	-	72.6%	-	-
Llama 4 Maverick Llama 4 Maverick is a natively multimodal model capable of processing both text and images. It features a 17 billion active parameter mixture-of-experts (MoE) architecture with 128 experts, supporting a wide range of multimodal tasks such as conversational interaction, image analysis, and code generation. The model includes a 1 million token context window.	Meta	Apr 5, 2025	Llama 4 Community License Agreement	-	-	-	43.4%	77.6%
Qwen3 30B A3B Qwen3-30B-A3B is a smaller Mixture-of-Experts (MoE) model from the Qwen3 series by Alibaba, with 30.5 billion total parameters and 3.3 billion activated parameters. Features hybrid thinking/non-thinking modes, support for 119 languages, and enhanced agent capabilities. It aims to outperform previous models like QwQ-32B while using significantly fewer activated parameters.	Alibaba	Apr 29, 2025	Apache 2.0	-	-	-	62.6%	-
Llama 3.3 70B Instruct Llama 3.3 is a multilingual large language model optimized for dialogue use cases across multiple languages. It is a pretrained and instruction-tuned generative model with 70 billion parameters, outperforming many open-source and closed chat models on common industry benchmarks. Llama 3.3 supports a context length of 128,000 tokens and is designed for commercial and research use in multiple languages.	Meta	Dec 6, 2024	Llama 3.3 Community License Agreement	-	-	88.4%	-	-
QwQ-32B-Preview An experimental research model focused on advancing AI reasoning capabilities, particularly excelling in mathematics and programming. Features deep introspection and self-questioning abilities while having some limitations in language mixing and recursive reasoning patterns.	Alibaba	Nov 28, 2024	Apache 2.0	-	-	-	50.0%	-
Llama 3.2 90B Instruct Llama 3.2 90B is a large multimodal language model optimized for visual recognition, image reasoning, and captioning tasks. It supports a context length of 128,000 tokens and is designed for deployment on edge and mobile devices, offering state-of-the-art performance in image understanding and generative tasks.	Meta	Sep 25, 2024	Llama 3.2	-	-	-	-	-

Showing 1 to 10 of 14 models

Resources

Official Website