Google

Major Contributor

google.com

About

Technology giant with AI research

Portfolio Stats

Total Models25

Multimodal20

Benchmarks Run411

Avg Performance57.3%

Latest Release

Gemma 3n E4B

Released: Jun 26, 2025

Multimodal

Release Timeline

Recent model releases by year

2025

18 models

2024

7 models

Performance Overview

Top models and benchmark performance

Top Performing Models

By avg score

#1Gemini 2.0 Flash Thinking

74.3%

#2Gemini 1.5 Pro

72.6%

#3Gemma 2 27B

69.1%

#4Gemini 2.5 Pro Preview 06-05

68.8%

#5Gemini 2.5 Pro

67.1%

Benchmark Categories

factuality

75.7%

reasoning

69.9%

vision

68.6%

math

61.7%

code

53.8%

Model Statistics

Multimodal Ratio

80%

Models with Providers

All Models

Complete portfolio of 25 models with advanced filtering

		License
Gemini 2.5 Pro Preview 06-05 The latest preview version of Google's most advanced reasoning Gemini model, capable of solving complex problems. Built for the agentic era with enhanced reasoning capabilities, multimodal understanding (text, image, video, audio), and a 1M token context window. Features thinking preview, code execution, grounding with Google Search, system instructions, function calling, and controlled generation. Supports up to 3,000 images per prompt, 45-60 minutes of video, and 8.4 hours of audio.	Jun 5, 2025	Proprietary	67.2%	82.2%	-	69.0%	-
Gemini 2.5 Pro Our most intelligent AI model, built for the agentic era. Gemini 2.5 Pro leads on common benchmarks with enhanced reasoning, multimodal capabilities (text, image, video, audio input), and a 1M token context window.	May 20, 2025	Proprietary	63.2%	76.5%	-	-	-
Gemini 2.5 Flash A thinking model designed for a balance between price and performance. It builds upon Gemini 2.0 Flash with upgraded reasoning, hybrid thinking control, multimodal capabilities (text, image, video, audio input), and a 1M token input context window.	May 20, 2025	Proprietary	60.4%	61.9%	-	-	-
Gemini 2.5 Flash-Lite Gemini 2.5 Flash-Lite is a model developed by Google DeepMind, designed to handle various tasks including reasoning, science, mathematics, code generation, and more. It features advanced capabilities in multilingual performance and long context understanding. It is optimized for low latency use cases, supporting multimodal input with a 1 million-token context length.	Jun 17, 2025	Creative Commons Attribution 4.0 License	31.6%	26.7%	-	33.7%	-
Gemini Diffusion Gemini Diffusion is a state-of-the-art, experimental text diffusion model from Google DeepMind. It explores a new kind of language model designed to provide users with greater control, creativity, and speed in text generation. Instead of predicting text token-by-token, it learns to generate outputs by refining noise step-by-step, allowing for rapid iteration and error correction during generation. Key capabilities include rapid response times (reportedly 1479 tokens/sec excluding overhead), generation of more coherent text by outputting entire blocks of tokens at once, and iterative refinement for consistent outputs. It excels at tasks like editing, including in math and code contexts.	May 20, 2025	Proprietary	22.9%	-	89.6%	30.9%	76.0%
Gemma 3n E4B Gemma 3n is a multimodal model designed to run locally on hardware, supporting image, text, audio, and video inputs. It features a language decoder, audio encoder, and vision encoder, and is available in two sizes: E2B and E4B. The model is optimized for memory efficiency, allowing it to run on devices with limited GPU RAM. Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma models are well-suited for a variety of content understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone. Gemma 3n models are designed for efficient execution on low-resource devices. They are capable of multimodal input, handling text, image, video, and audio input, and generating text outputs, with open weights for instruction-tuned variants. These models were trained with data in over 140 spoken languages.	Jun 26, 2025	Proprietary	-	-	-	-	-
Gemma 3n E2B Gemma 3n is a multimodal model designed to run locally on hardware, supporting image, text, audio, and video inputs. It features a language decoder, audio encoder, and vision encoder, and is available in two sizes: E2B and E4B. The model is optimized for memory efficiency, allowing it to run on devices with limited GPU RAM. Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma models are well-suited for a variety of content understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone. Gemma 3n models are designed for efficient execution on low-resource devices. They are capable of multimodal input, handling text, image, video, and audio input, and generating text outputs, with open weights for instruction-tuned variants. These models were trained with data in over 140 spoken languages.	Jun 26, 2025	Proprietary	-	-	-	-	-
Gemma 3n E4B Instructed Gemma 3n is a multimodal model designed to run locally on hardware, supporting image, text, audio, and video inputs. It features a language decoder, audio encoder, and vision encoder, and is available in two sizes: E2B and E4B. The model is optimized for memory efficiency, allowing it to run on devices with limited GPU RAM. Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma models are well-suited for a variety of content understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone. Gemma 3n models are designed for efficient execution on low-resource devices. They are capable of multimodal input, handling text, image, video, and audio input, and generating text outputs, with open weights for instruction-tuned variants. These models were trained with data in over 140 spoken languages.	Jun 26, 2025	Proprietary	-	-	75.0%	13.2%	63.6%
Gemma 3n E2B Instructed Gemma 3n is a multimodal model designed to run locally on hardware, supporting image, text, audio, and video inputs. It features a language decoder, audio encoder, and vision encoder, and is available in two sizes: E2B and E4B. The model is optimized for memory efficiency, allowing it to run on devices with limited GPU RAM. Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma models are well-suited for a variety of content understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone. Gemma 3n models are designed for efficient execution on low-resource devices. They are capable of multimodal input, handling text, image, video, and audio input, and generating text outputs, with open weights for instruction-tuned variants. These models were trained with data in over 140 spoken languages.	Jun 26, 2025	Proprietary	-	-	66.5%	13.2%	56.6%
Gemma 3n E2B Instructed LiteRT (Preview) Gemma 3n is a generative AI model optimized for use in everyday devices, such as phones, laptops, and tablets. It features innovations like Per-Layer Embedding (PLE) parameter caching and a MatFormer model architecture for reduced compute and memory. These models handle audio, text, and visual data, though this E4B preview currently supports text and vision input. Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models, and is licensed for responsible commercial use.	May 20, 2025	Gemma	-	-	66.5%	13.2%	56.6%

Showing 1 to 10 of 25 models

Resources

Official Website