Databricks releases DBRX, a 132B MoE open model claiming SOTA among open LLMs

The $10M Mixture-of-Experts model beats Llama 2 70B and Grok-1 on benchmarks, but requires four H100 GPUs to run and is tightly coupled to Databricks' platform.

SAN FRANCISCO — Databricks today released DBRX, a 132 billion parameter open large language model using a mixture-of-experts (MoE) architecture. The model, which the company says cost roughly $10 million and two months to train on 3,072 NVIDIA H100 GPUs, claims state-of-the-art results among open models on standard benchmarks including MMLU (73.7%), HumanEval (70.1%), and GSM8K (66.9%). DBRX has 16 experts, selecting 4 per input, and 36 billion active parameters. It is released under an open license on Hugging Face.

Databricks says DBRX surpasses GPT-3.5 on several benchmarks and is competitive with Gemini 1.0 Pro. However, in a sign of the model’s practical limitations, running DBRX requires at least four H100 GPUs, making it inaccessible to most individual developers. Databricks VP Naveen Rao indicated that the company’s managed Mosaic AI platform is the intended deployment path, saying “the benefit to Databricks is more users on our platform.” The model does not beat GPT-4, lacks multimodal capabilities, and Databricks does not at present offer legal indemnification for copyright risks.

The record

The room reactsas it happened

Kyle Wiggers

Reported for TechCrunch that DBRX requires at least four H100 GPUs to run and that Databricks does not at present offer an indemnification policy for copyright issues.

One year later — open only if you can handle spoilers

DBRX faded from prominence within months, overtaken by Llama 3 in April 2024 and later by other open models. Its primary legacy was demonstrating that MoE architectures could be trained efficiently by enterprises, but it never achieved wide adoption outside Databricks' ecosystem.

Replay thisPost on X Reddit HN LinkedIn