ChatGPT’s Training Methods Challenged in Lawsuit


OpenAI, the prominent artificial intelligence (AI) company behind the popular ChatGPT chatbot, is facing yet another legal challenge. A class action lawsuit was filed against OpenAI in federal court on Wednesday, alleging copyright infringement and privacy violations related to the company’s training methods and data usage.

The lawsuit (PDF) doesn’t come as a surprise considering that ChatGPT is a relatively new chatbot, and may take some time for trainers to address copyright violation issues. It’s worth noting that these problems aren’t limited to ChatGPT alone; other AI systems like Microsoft’s Bing AI and Google’s BARD AI have also faced similar violations. For instance, just a couple of weeks ago, users managed to trick ChatGPT, Bing AI, and BARD AI into generating activation keys for Windows 10 and Windows 11.

The lawsuit, brought forth by science fiction and horror author Paul Tremblay and novelist Mona Awad, accuses OpenAI of training ChatGPT on copyrighted books without obtaining proper consent, credit, or compensation from the authors. The authors claim that since ChatGPT is capable of providing accurate summaries of their works, it indicates that their books were copied and incorporated into OpenAI’s language model without permission.

The complaint points to a 2020 OpenAI paper that mentions the use of “two internet-based book corpora” for training ChatGPT, suggesting that one of these datasets, consisting of over 290,000 titles, may have originated from “shadow libraries” like Library Genesis and Sci-Hub. These shadow libraries are known for illegally publishing copyrighted works using torrent systems. The authors argue that OpenAI’s utilization of these datasets infringes copyright law and further allege that ChatGPT removes copyright notices from the books, violating the Digital Millennium Copyright Act.

The lawsuit, according to Bloomberg Law, also argues that OpenAI has integrated social media apps such as Spotify, Snapchat, Stripe, Slack, and Microsoft Teams with its systems to collect personal user data including images, locations, music tastes, financial details, and private communications. Gathering such data is a violation of the terms of service these platforms offer the lawsuit claims.

OpenAI has also been separately sued in a sweeping class action lawsuit, accusing the company of unlawfully collecting personal information from the internet through its AI models, including ChatGPT and the text-to-image generator DALL-E. The lawsuit asserts that such data scraping violates various state and federal privacy laws. This legal battle highlights the growing concerns surrounding AI technology and the need for regulations to address potential privacy breaches.

These lawsuits could have far-reaching consequences for OpenAI and the AI industry as a whole. The outcomes may set important precedents regarding AI, copyright infringement, and privacy, influencing future regulatory frameworks. If the court rules in favour of the plaintiffs, OpenAI may face substantial financial penalties, potentially impacting its financial stability and fundraising efforts. Additionally, the lawsuits could tarnish the company’s reputation, leading to increased scrutiny from regulators and the need for further transparency.

The implications extend beyond OpenAI, as companies utilizing ChatGPT or other OpenAI products may reconsider their partnerships to safeguard their reputations and protect user privacy. Moreover, if the courts determine that using copyrighted material for training AI models constitutes infringement, OpenAI and other companies may need to reconsider their data acquisition practices.

As these legal battles unfold, industry observers must monitor the progress of these lawsuits. The outcomes may result in new laws and policies, reshape AI development practices, and require companies to adapt their offerings to align with evolving legal and privacy standards.



Source link