OpenAI unveils Sora, a text-to-video model that generates photorealistic clips up to 60 seconds long

The model, which OpenAI frames as a 'world simulator,' produces complex scenes with multiple characters and specific motion, but is not yet available to the public; OpenAI says the model is currently restricted to red teamers and a small group of visual artists, designers and filmmakers for feedback.

OpenAI today unveiled Sora, a text-to-video model capable of generating photorealistic clips up to 60 seconds long. The company says Sora can create ‘complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background.’ OpenAI says the model can understand how objects exist in the physical world.

Sora is not yet publicly available; it is currently restricted to red teamers and a select group of visual artists, designers, and filmmakers for feedback. The model can also generate video from still images and fill in missing frames in existing clips.

OpenAI’s demos include an aerial scene of California during the gold rush and a video that looks as if it were shot from inside a Tokyo train. While OpenAI acknowledges the model may struggle with physics and cause-and-effect, the demos suggest a leap in quality that rivals existing offerings from Runway, Pika, and Google’s own Lumiere.

The record

One year later — open only if you can handle spoilers

Sora remained private for over a year; by mid-2026 it still had not seen wide release. The 'world simulator' framing proved influential, pushing competitors to emphasize physics modeling, but the model's limitations on long-range consistency kept it a demo rather than a product.

Replay thisPost on X Reddit HN LinkedIn