one year on
Meta releases Llama 3 8B and 70B, claims best open models yet
The new models, trained on 15 trillion tokens and competitive with much larger proprietary systems, are available now with an over-400B-parameter model still in training.
Today Meta released the first two models in its Llama 3 family: an 8-billion-parameter model and a 70-billion-parameter model. The company says both are the most capable openly available language models at their respective scales.
The models were trained on over 15 trillion tokens — seven times the data used for Llama 2 — and include four times more code. Meta says the 70B model is competitive with Google’s Gemini 1.5 Pro and beats Claude 3 Sonnet on several benchmarks, including MMLU, HumanEval and GSM-8K. The 8B model outperforms Mistral 7B and Gemma 7B on at least nine benchmarks.
The largest Llama 3 model, with over 400 billion parameters, is still training. Meta teased that future versions will add multimodality, multilingual support and longer context windows. Alongside the models, Meta released Llama Guard 2 for safety filtering, Code Shield for insecure code detection, and CyberSecEval 2 for cybersecurity evaluation.
Meta AI, the company’s assistant, now runs on Llama 3 and is rolling out across Facebook, Instagram, WhatsApp, Messenger and the web. Developers can download the models from Meta’s website, and cloud availability spans AWS, Azure, Google Cloud, Hugging Face and others.
The immediate debate is whether Meta’s ‘open’ label holds up when the license restricts use for large apps and bans training other models.
The record
Wiggers notes that Meta's performance claims rest on benchmarks of debated validity, and that the models are not fully open source due to usage restrictions.
One year later — open only if you can handle spoilers
Llama 3 405B, the largest model teased here, was released in July 2024 as a truly open-weight model under a permissive license, becoming a staple for the open-source community. The 8B and 70B models saw massive adoption, with over 1.2 million downloads in the first week and continued use in fine-tuning and deployment.