Judge Demands OpenAI to Release 20 Million Anonymized ChatGPT Chats in AI Copyright Dispute

Judge Demands OpenAI to Release 20 Million Anonymized ChatGPT Chats in AI Copyright Dispute

A federal judge in New York has ordered OpenAI to provide 20 million anonymized user logs from ChatGPT to the plaintiffs in a major copyright lawsuit involving AI. The judge made this decision despite OpenAI’s privacy concerns, upholding an earlier ruling on the matter.

District Judge Sidney H. Stein affirmed Magistrate Judge Ona T. Wang’s November order on January 5, 2026, in the Southern District of New York, ruling that privacy safeguards adequately balance the logs’ relevance to infringement claims.

OpenAI objected, arguing that the full dataset representing 0.5% of preserved logs was unduly burdensome and risked exposing user data, and instead proposed searching for conversations referencing plaintiffs’ works. Stein rejected this, noting that no case law mandates the “least burdensome” discovery method.​

The saga began in July 2025 when news outlets, including The New York Times Co. and Chicago Tribune Co. LLC, sought 120 million logs to probe whether ChatGPT outputs infringed their copyrights by reproducing trained-on content.

OpenAI provided a dataset of 20 million samples, which the plaintiffs accepted but later declined full production, arguing that 99.99% of the logs were irrelevant.

Wang sided with the plaintiffs in November, denied OpenAI’s reconsideration in December, and Stein’s affirmation seals the deal under a protective order with de-identification protocols, Bloomberg stated.

google

OpenAI invoked a Second Circuit securities case that blocked SEC wiretap disclosures, but Stein sharply distinguished it: ChatGPT logs involve voluntary user inputs and undisputed company ownership, unlike surreptitious recordings. “Users’ privacy would be protected by the company’s exhaustive de-identification,” Wang had ruled earlier.​

This ruling advances pretrial discovery in In re OpenAI, Inc. Copyright Infringement Litigation (No. 1:25-md-03143), consolidating 16 suits from news organizations, authors, and others alleging unauthorized use of works to train large language models.

It mirrors dozens of cases against AI firms like Microsoft and Meta, testing copyright’s application to generative tech amid debates over fair use and data scraping.​

Plaintiffs argue the logs are vital to rebut OpenAI’s claims that they “hacked” responses for evidence and to assess infringement scope. OpenAI maintains anonymization and orders suffice, with no user privacy at risk.​

The decision spotlights tensions between discovery proportionality and AI data hoards, potentially setting precedents for similar cases. Critics worry bulk log handovers could chill user trust in chatbots, while supporters see it as essential transparency.

OpenAI, represented by Keker Van Nest, Latham & Watkins, and Morrison & Foerster, faces production deadlines soon.​

As AI litigation proliferates, this order underscores courts’ willingness to compel the production of expansive evidence, even anonymized, to probe training data practices. For content creators, it bolsters tools to challenge AI’s copyright encroachments; for tech giants, it signals rising scrutiny on user data vaults.​

Dr. Kolochenko, CEO at ImmuniWeb, said to Cybersecuritynews that “For OpenAI, this decision is certainly a legal debacle, which will inspire other plaintiffs in similar cases to do the same to prevail in courts or to get much better settlements from AI companies.”

“This case is also a telling reminder that, regardless of your privacy settings, your interactions with AI chatbots and other systems may one day be produced in court. Architecture of modern LLMs and their underlying technology stack is very complex, so even if some user-facing systems are specifically configured to delete chat logs and history, some others may inevitably preserve them in one form or another. In some cases, produced evidence may trigger investigations and even criminal prosecution of AI users.”

Follow us on Google News, LinkedIn, and X for daily cybersecurity updates. Contact us to feature your stories.

googlenews



Source link