Google releases Gemma 2, a next-generation open model family that competes with Llama 3 and runs on a single GPU

Google DeepMind claims the 27B model offers best-in-class performance for its size, outperforming Llama 3 8B and rivaling models twice its size, while running on a single NVIDIA H100 or TPU host.

Google DeepMind today released Gemma 2, the second generation of its open-weight model family, now available in 9B and 27B parameter sizes. The company claims the 27B variant achieves “best-in-class” performance for its size class on key benchmarks including MMLU, GSM8K, and ARC-c, while requiring only a single NVIDIA H100 Tensor Core GPU or TPU host for inference. The 9B model also leads its category, outperforming Llama 3 8B and other similarly sized models, according to Google’s technical report.

Gemma 2 incorporates several architectural innovations including a hybrid attention mechanism that interleaves sliding window attention (4,096 tokens) with full quadratic attention (8,192 tokens) in alternating layers, and logit soft-capping to stabilize training. The 9B model was pre-trained using knowledge distillation from a larger teacher model, while the 27B was trained from scratch on 13 trillion tokens of web data, code, and math. Both models use a permissive license allowing commercial use, redistribution, and derivative works.

The models are immediately available on Hugging Face, Google AI Studio, and Kaggle, with integrations for Hugging Face Transformers, JAX, PyTorch, TensorFlow, vLLM, Ollama, and NVIDIA TensorRT-LLM. Google Cloud customers will be able to deploy Gemma 2 on Vertex AI starting next month. The release positions Gemma 2 as a credible open-weight competitor to Meta’s Llama 3, potentially reshaping the landscape of language model availability and cost.

The record

The room reactsas it happened

Philipp Schmid@philschmid

Published a detailed blog on Hugging Face highlighting Gemma 2's technical advances like sliding window attention, soft-capping, and knowledge distillation, and providing code examples for inference and fine-tuning.

One year later — open only if you can handle spoilers

Gemma 2's claims of best-in-class performance held up well, with the 27B model remaining competitive through mid-2025. Google later expanded the family with a 2.6B model, ShieldGemma for safety, and Gemma Scope for interpretability, solidifying its spot alongside Llama as a go-to open-weight family.

Replay thisPost on X Reddit HN LinkedIn