OAIC draws lines on data usage for generative AI training – Software


Australia’s privacy watchdog has drawn lines around how it intends to adjudicate mass ingestion of data into AI models, both by developers and end users.



New guidelines issued by the Office of the Australian Information Commissioner (OAIC) clarify interpretations of existing Privacy Act legislation and how it applies to the use of personal – and especially sensitive – information when training or fine-tuning generative AI models or systems.

Although the Privacy Act only applies to businesses with $3 million turnover, the OAIC told iTnews the guidelines would cover “all organisations that are using or developing AI products involving personal information in Australia”.

An OAIC spokesperson said it is already in the “initial stages of assessing a number of practices and entities for compliance” relating to generative AI but has yet to formally launch any investigations.

“If non-compliance comes to our attention, we will consider taking regulatory action,” the spokesperson said.

“Whether we proceed to take action will depend on a number of factors, as articulated in our statement of regulatory approach.”

Under the Privacy Act, organisations can only “collect or use only the personal information that is reasonably necessary for their purpose”.

They are then only allowed to use it or share it with third parties for that stated purpose unless “consent or an exception applies”.

Organisations that intend to use data for training AI models should “explicitly refer” to it when collecting consents, “rather than relying on broad or vague purposes such as research”, the guidelines state.

Directly informing customers or changing privacy terms are unlikely to count as “sufficient” means for establishing consent for using previously acquired data to train AI.

Additionally, “developers must consider whether data they intend to use or collect (including publicly available data) contains personal information and comply with their privacy obligations,” the guidance states.

“Even if it was made public by the individual themselves, they may not expect it to be collected and used to train an AI model.”

For sensitive information, such as biometrics and health data, tighter scrutiny will apply. Under the Privacy Act, sensitive information may only be used in limited circumstances and with consent.

As such, organisations may risk using sensitive information without establishing consent when data is scraped from the web, particularly from photographs and recordings.

The guidelines also spell out that creating a dataset through web scraping “may constitute a covert and therefore unfair means of collection”, which could also breach the Privacy Act’s stipulation that data be “collected by lawful and fair means”.

“Where a third-party dataset is being used, developers must consider information about the data sources and compilation process for the dataset. They may need to take additional steps (for example, deleting information) to ensure they are complying with their privacy obligations,” the OAIC added.

As guidance for future data collection, the OAIC suggested organisations “delete or de-identify personal information or provide individuals with control over the use of their personal information”.



Source link