OpenAI unveils GPT-4o, a flagship model that handles text, speech and video in real time

OpenAI’s new flagship model brings GPT-4-level intelligence to free ChatGPT users starting today, with a voice mode that lets users interrupt ChatGPT and hear emotive responses, including singing.

OpenAI today announced GPT-4o (the ‘o’ stands for omni), a new flagship model that handles text, speech and video. CTO Mira Murati, who fronted the livestream presentation from OpenAI’s San Francisco offices, said the model delivers GPT-4-level intelligence but with radically improved multimodal capabilities.

‘GPT-4o reasons across voice, text and vision,’ Murati said during the stream. ‘This is incredibly important, because we’re looking at the future of interaction between ourselves and machines.’

The centerpiece demo showed a voice mode that lets users interrupt ChatGPT mid-sentence and receive responses in a range of emotional tones — even singing. The improved voice experience will enter alpha for ChatGPT Plus subscribers in the coming month.

GPT-4o is available in the free tier of ChatGPT starting today, with higher rate limits for Plus and Team users. On the API side, GPT-4o is twice as fast, half the price and has higher rate limits than GPT-4 Turbo. OpenAI also released a refreshed ChatGPT web UI and a macOS desktop app; Windows is due later this year.

The record

TechCrunch: OpenAI debuts GPT-4o omni model now powering ChatGPT

One year later — open only if you can handle spoilers

GPT-4o became OpenAI’s most broadly deployed model, though the ‘Sky’ voice was later pulled after backlash and a temporary pause. The free-tier release permanently raised the baseline for consumer AI.

Replay thisPost on X Reddit HN LinkedIn