Generative AI at watershed moment with spate of legal challenges - Cybernoz

Everywhere you look right now, it’s impossible to avoid the existence of generative artificial intelligence (AI). From ChatGPT to image creators like Stable Diffusion, the industry has ballooned from almost nothing into a global super-industry. But not everyone is happy. In January 2023, image licensing company Getty Images started legal proceedings against the owners of AI image creation app Stable Diffusion over its alleged breaching of copyright laws.

It’s just one of a growing number of cases – including legal challenges against image AI Midjounrey and Microsoft-backed flagship Open AI – that could determine the future of the technology.

But these legal battles carry more than just the future of generative AI on their shoulders, and could affect the entire future of AI art, content creation and the ability to control how our personal data is used.

The reasons for the court case are pretty simple on the surface. Getty Images, as an image licensing platform, charges a fee for users to access or use images. That system poses a major problem for generative AI systems like ChatGPT or Stable Diffusion, which are reliant on mass data scraping to train their systems on how to answer prompts.

“Training these generative AI models involves vast amounts of data,” says Laura Houston, an expert in copyright law and a partner at law firm Slaughter and May. “For example, in text to image models, you’ve got this need to feed it with hundreds of millions of data points to teach the model to find statistical relations between the words and images.”

Simply put – if an AI image creator wants to work out how to create a picture of, say, a chicken wearing a top hat – it needs to study as many images as it can of chickens and top hats. The sheer scale of the data it needs to learn that ability makes it impossible to meaningfully sift the copyrighted from the un-copyrighted images.

“You’ve got the intellectual property [IP] infringement risk that flows from use of that data to teach the AI model,” she says. “But then you’ve also got the question of what the AI model generates as a result, and whether by virtue of the data it’s trained on, the output of the model risks infringing the IP of that input data.”

This is not all just an intellectual exercise. Copyright law is what underpins the ability of all artists and content creators to be able to protect and control, and thus actually make money from, their work. If generative AI is able to cut straight through that, and use their work to train its systems, it could profit while decimating cultural industries worldwide.

But the legal and moral questions don’t stop with copyright laws. Generative AI and large language models have increasingly been falling foul of data protection regulators, too.

Already, the Italian data regulator has banned Open AI-based chatbot Replika from gathering data in the country.

“Publicly available data is still personal data under the GDPR [General Data Protection Regulation] and other data protection and privacy laws, so you still need a legal basis for processing it,” says Robert Bateman, a data protection expert. “The problem is, I don’t know how much these companies have thought about that… I think it’s a bit of a legal time bomb.”

The personal data breaches are often also pretty strange. Last month, FT journalist Dave Lee found out ChatGPT was giving out his Signal number (posted on his Twitter account) as the chat bot’s own number, and was subsequently inundated with random messages. Even that kind of publicly posted data falls under data protection laws, according to Bateman.

“There is such a thing as contextual privacy,” he says. “You might put your number up on Twitter, and not expect it to appear in a database in China. The same goes for you not [necessarily] expecting it to become the output of chatbots. Data accuracy is one of the principles of the GDPR. You are obliged to make sure personal data in your processes is accurate and up to date.

“But large language models hallucinate about 20% of the time, apparently. On that basis, there’s going to be a lot of inaccurate information about people being distributed.”

Search

Determining breaches

Attribution

Latest Posts