Alibaba releases downloadable reasoning model QwQ-32B-Preview, challenging OpenAI's o1

The Qwen team's 32.5-billion-parameter model, available under Apache 2.0, outperforms o1-preview on math benchmarks and shows reasoning steps in demo cases.

Alibaba’s Qwen team today released QwQ-32B-Preview, an openly available reasoning model that beats OpenAI’s o1-preview on the AIME and MATH benchmarks. The 32.5-billion-parameter model, distributed under an Apache 2.0 license, scores 50.0% on AIME and 90.6% on MATH-500; TechCrunch says those results beat o1-preview on both tests. It also achieves 65.2% on GPQA and 50.0% on LiveCodeBench.

Like o1, QwQ employs test-time compute—essentially giving the model extra processing cycles to self-correct before arriving at an answer. The model reveals its internal reasoning in visible chain-of-thought steps, a departure from OpenAI’s more opaque approach. However, Alibaba cautions that the preview model can mix languages, fall into recursive reasoning loops, and underperform on common-sense tasks.

QwQ-32B-Preview is available for download on Hugging Face and can be run locally. Only certain components of the model have been released, making it impossible to replicate QwQ-32B-Preview or inspect its inner workings. The model also reflects the political constraints of its Chinese developers: In TechCrunch’s testing, the model said Taiwan was part of China and declined to respond to Tiananmen Square prompts.

The release comes amid growing industry interest in reasoning models as alternatives to simply scaling up parameters. Google has reportedly expanded its own reasoning team to about 200 people, signaling that the approach is increasingly seen as a path forward.

The record

One year later — open only if you can handle spoilers

Over the following year, QwQ-32B-Preview sparked a wave of open-weights reasoning models from other labs, though its adoption was tempered by political content restrictions. The model's visible chain-of-thought became a standard feature in later open reasoning models.

Replay thisPost on X Reddit HN LinkedIn