The demand started at 1.4 billion conversations.
That staggering initial request from The New York Times, later negotiated down to 20 million randomly sampled ChatGPT conversations, has thrust OpenAI into a legal fight that security experts warn could fundamentally reshape data retention practices across the AI industry. The copyright infringement lawsuit has evolved beyond intellectual property disputes into a broader battle over user privacy, data governance, and the obligations AI companies face when litigation collides with privacy commitments.
OpenAI received a court preservation order on May 13, directing the company to retain all output log data that would otherwise be deleted, regardless of user deletion requests or privacy regulation requirements. District Judge Sidney Stein affirmed the order on June 26 after OpenAI appealed, rejecting arguments that user privacy interests should override preservation needs identified in the litigation.
Privacy Commitments Clash With Legal Obligations
The preservation order forces OpenAI to maintain consumer ChatGPT and API user data indefinitely, directly conflicting with the company’s standard 30-day deletion policy for conversations users choose not to save. This requirement encompasses data from December 2022 through November 2024, affecting ChatGPT Free, Plus, Pro, and Team subscribers, along with API customers without Zero Data Retention agreements.
ChatGPT Enterprise, ChatGPT Edu, and business customers with Zero Data Retention contracts remain excluded from the preservation requirements. The order does not change OpenAI’s policy of not training models on business data by default.
OpenAI implemented restricted access protocols, limiting preserved data to a small, audited legal and security team. The company maintains this information remains locked down and cannot be used beyond meeting legal obligations. No data will be turned over to The New York Times, the court, or external parties at this time.
Also read: OpenAI Announces Safety and Security Committee Amid New AI Model Development
Copyright Case Drives Data Preservation Demands
The New York Times filed its copyright infringement lawsuit in December 2023, alleging OpenAI illegally used millions of news articles to train large language models including ChatGPT and GPT-4. The lawsuit claims this unauthorized use constitutes copyright infringement and unfair competition, arguing OpenAI profits from intellectual property without permission or compensation.
The Times seeks more than monetary damages. The lawsuit demands destruction of all GPT models and training sets using its copyrighted works, with potential damages reaching billions of dollars in statutory and actual damages.
The newspaper’s legal team argued their preservation request warranted approval partly because another AI company previously agreed to hand over 5 million private user chats in an unrelated case. OpenAI rejected this precedent as irrelevant to its situation.
Technical and Regulatory Complications
Complying with indefinite retention requirements presents significant engineering challenges. OpenAI must build systems capable of storing hundreds of millions of conversations from users worldwide, requiring months of development work and substantial financial investment.
The preservation order creates conflicts with international data protection regulations including GDPR. While OpenAI’s terms of service allow data preservation for legal requirements—a point Judge Stein emphasized—the company argues The Times’s demands exceed reasonable discovery scope and abandon established privacy norms.
OpenAI proposed several privacy-preserving alternatives, including targeted searches over preserved samples to identify conversations potentially containing New York Times article text. These suggestions aimed to provide only data relevant to copyright claims while minimizing broader privacy exposure.
Recent court modifications provided limited relief. As of September 26, 2025, OpenAI no longer must preserve all new chat logs going forward. However, the company must retain all data already saved under the previous order and maintain information from ChatGPT accounts flagged by The New York Times, with the newspaper authorized to expand its flagged user list while reviewing preserved records.
“Our long-term roadmap includes advanced security features designed to keep your data private, including client-side encryption for your messages with ChatGPT. We will build fully automated systems to detect safety issues in our products. Only serious misuse and critical risks—such as threats to someone’s life, plans to harm others, or cybersecurity threats—may ever be escalated to a small, highly vetted team of human reviewers.” – Dane Stuckey, Chief Information Security Officer, OpenAI
Implications for AI Governance
The case transforms abstract AI privacy concerns into immediate operational challenges affecting 400 million ChatGPT users. Security practitioners note the preservation order shatters fundamental assumptions about data deletion in AI interactions.
OpenAI CEO Sam Altman characterized the situation as accelerating needs for “AI privilege” concepts, suggesting conversations with AI systems should receive protections similar to attorney-client privilege. The company frames unlimited data preservation as setting dangerous precedents for AI communication privacy.
The litigation presents concerning scenarios for enterprise users integrating ChatGPT into applications handling sensitive information. Organizations using OpenAI’s technology for healthcare, legal, or financial services must reassess compliance with regulations including HIPAA and GDPR given indefinite retention requirements.
Legal analysts warn this case likely invites third-party discovery attempts, with litigants in unrelated cases seeking access to adversaries’ preserved AI conversation logs. Such developments would further complicate data privacy issues and potentially implicate attorney-client privilege protections.
The outcome will significantly impact how AI companies access and utilize training data, potentially reshaping development and deployment of future AI technologies. Central questions remain unresolved regarding fair use doctrine application to AI model training and the boundaries of discovery in AI copyright litigation.
Also read: OpenAI’s SearchGPT: A Game Changer or Pandora’s Box for Cybersecurity Pros?
