DeepSeek releases V3, a 671B MoE model showing comparable benchmark performance to GPT-4o and Claude 3.5

The open-weight model is already drawing comparisons to frontier closed models, with benchmarks and a technical report published alongside the release.

DeepSeek today released DeepSeek-V3, a 671-billion-parameter mixture-of-experts model with 37 billion activated parameters per token. The model, trained on 14.8 trillion tokens, reaches performance comparable to GPT-4o and Claude 3.5 Sonnet across a wide range of benchmarks, including coding, mathematics, and general language tasks.

The model is released under an open-weight license, with the full technical paper and training details publicly available. DeepSeek also announced API pricing: $0.27 per million input tokens (cache miss), $0.07 (cache hit), and $1.10 per million output tokens, keeping rates unchanged from V2 until February 8. The company emphasizes its commitment to ‘open-source spirit + Longtermism to inclusive AGI’ and promises additional features like multimodal support in the future.

The record

One year later — open only if you can handle spoilers

DeepSeek-V3 proved to be a watershed moment for open-weight AI. In January 2025, it triggered a wave of geopolitical attention as US policymakers debated export controls, while the model's efficiency set a new standard for cost-effective training. By mid-2025, it had become one of the most widely used open-source models globally, reshaping the competitive landscape.

Replay thisPost on X Reddit HN LinkedIn