Meta releases Llama 4 Scout, Maverick, and Behemoth, claims 10-million-token context and top-tier benchmark scores

Meta drops three new MoE models on a Saturday, including a 10M-token-context Scout that fits on a single H100, amid heightened pressure from DeepSeek's open-weight success.

Meta released Llama 4 on a Saturday — a surprise drop that includes Scout (17B active parameters, 16 experts, 10M-token context), Maverick (17B active, 128 experts, 400B total), and a preview of Behemoth (288B active, 2T total), which is still training.

Meta claims Scout fits on a single Nvidia H100 with INT4 quantization, delivers the best multimodal results in its class, and beats Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1. Maverick, the company says, beats GPT-4o and Gemini 2.0 Flash, and is comparable with the much larger DeepSeek v3.1 on reasoning and coding at less than half the active parameters. An experimental chat version scored 1417 ELO on LMArena. Behemoth, distilled into the smaller models, is said to outperform GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM benchmarks.

The release comes days after reports that Meta scrambled to respond to DeepSeek’s open-weight models. The Llama 4 license prohibits use by EU-based companies or those with over 700 million MAUs without a special license.

The record

One year later — open only if you can handle spoilers

Within days, the Maverick LMArena score drew criticism after it emerged the score came from an unreleased chat-tuned variant. The Saturday release was seen as a rushed answer to DeepSeek, and doubts about benchmark transparency contributed to a perception that Llama's momentum had slipped.

Replay thisPost on X Reddit HN LinkedIn