New York Times sues OpenAI and Microsoft for copyright infringement over training data

In a landmark lawsuit filed in Manhattan federal court, the newspaper demands billions in damages and the destruction of models trained on its articles, becoming the largest publisher involved in such a suit to date.

The New York Times today filed a copyright infringement lawsuit against OpenAI and Microsoft in the Federal District Court in Manhattan, alleging that millions of its articles were used without consent to train generative AI models like ChatGPT and Microsoft’s Copilot. The complaint demands that the companies destroy any models and training data containing Times content and seeks billions of dollars in statutory and actual damages.

The Times argues that OpenAI and Microsoft are free-riding on its massive investment in journalism to create products that substitute for the newspaper and steal audiences away. The complaint cites instances where ChatGPT reproduced Times articles near-verbatim and where Bing Chat provided incorrect information falsely attributed to the Times, highlighting potential brand damage from hallucinations. The newspaper also contends that the defendants are effectively building news competitors using its work, harming its subscription business by providing paywalled content without payment.

In response, an OpenAI spokesperson said the company was surprised and disappointed, noting that ongoing conversations with the Times had been productive and that OpenAI is committed to working with content creators. The lawsuit is the largest publisher involved in such a suit to date, following similar suits from authors and programmers. Some outlets like the Associated Press and Axel Springer have chosen licensing deals, but the Times says its own attempts at a licensing agreement with Microsoft and OpenAI since April were unsuccessful. The case is expected to shape the legal landscape for AI training data and copyright.

The record

The room reactsas it happened

OpenAI spokesperson

Expressed surprise and disappointment, saying ongoing conversations with the Times had been productive, and expressed hope for a mutually beneficial way to work together as with many other publishers.

Heather Meeker

An adviser on IP matters, compared the Times' example of ChatGPT regurgitating articles to using a word processor to cut and paste, arguing that teasing a chatbot into reproducing input is not a sensible basis for copyright infringement and that most such lawsuits will probably fail.

One year later — open only if you can handle spoilers

This lawsuit became a pivotal test case for fair use in AI training. By 2026, the case had not yet gone to trial, but it spurred a wave of licensing deals between publishers and AI companies, and the Times ultimately reached a separate commercial agreement with OpenAI in 2024. The complaint's focus on verbatim regurgitation forced labs to invest in training data attribution and copyright filters.

Replay thisPost on X Reddit HN LinkedIn