Rio Tinto turns to MLOps to grow machine learning uses – Cloud – Software


Rio Tinto is working to abstract behind-the-scenes complexity away from teams working to deliver new machine learning models across its operations.



(L-R) Marcus Rosen from Rio Tinto and Romina Sharifpour from AWS.

Machine learning operations principal Marcus Rosen told the AWS Summit Sydney earlier this year that the miner is embracing MLOps to standardise pathways to deliver and deploy new ML models.

“The core mission of my team is to optimise the delivery experience of our data science teams to make their lives easier and not get too bogged down into infrastructure and security issues,” Rosen said.

“We do that by working directly with those teams and with our cyber security team to build a secure but flexible delivery environment.”

Rosen said Rio Tinto made a strategic move seven years ago to set up a centralised and “dedicated machine learning capability … to work across our lines of business and product groups to help them deliver machine learning solutions.”

Data scientists, as well as ‘citizen’ data users, sit predominately in Brisbane for the miner’s aluminium operations, in Perth for its iron ore business, in Singapore for its commercial business, and in Montreal, Canada.

“We’ve also, over the past couple of years, built a specialist team in India to deal with productionised machine learning solutions,” Rosen said.

Supporting data scientists, engineers and ‘citizen’ users out in the business and product areas, are “tool and machine learning environments in Amazon SageMaker”.

The miner uses both SageMaker Studio as well as SageMaker Canvas, the latter a so-called ‘no-code’ tool to allow ‘citizen data scientists’ and non-technical users to build machine learning models.

“As we scale machine learning more and more across Rio Tinto, we need a more standardised approach for delivering models and deploying them, hence machine learning operations or MLOps,” Rosen said.

MLOps in SageMaker is treated as a way to “automate and standardise processes across the machine learning lifecycle”.

“We are of the view that the data science team should not have to spend any real time on infrastructure issues,” he said.

“It should just be able to go into a [SageMaker] notebook and install what they need to install etc, so we work heavily behind-the-scenes to enable that for our teams.”

Some of the behind-the-scenes work aims to improve access to datasets both across the organisation, and also from the internet.

“If you don’t have good data, you can’t build good machine learning models,” he said.

“[But] getting access to data can be difficult. We currently have multiple data lakes … [and] another challenge can be around networking. All of our production accounts are completely air-gapped, they have no internet access by default, and any internet access or any actual access outside of the account needs to be whitelisted through a centralised firewall. 

“Some teams need to access datasets that are internet-based – for example, satellite datasets that are really too big to bring into one of our data lakes, so we need to be able to enable that access in a timely manner.”

Rosen said that work is underway on a “multicloud data lakehouse platform” that will eventually enable teams “to self-service and publish their own data sets in more of a sort of data mesh type approach.”

In a data mesh, datasets are treated as ‘products’ that are owned and curated by one team, that also oversees access to it.

Rosen said that the miner is making heavy use of AWS PrivateLinks to access data stored in AWS cloud environments.

It is also working to make aspects of security and data access easier and more automated for teams, whose access to data may require, for example, firewall changes.

“Lastly, we work very heavily behind-the-scenes to preconfigure our machine learning environments on SageMaker Studio and Canvas,” Rosen said.

Rosen called out three operational areas where machine learning had made a difference.

One area is predictive maintenance on the miner’s private rail network, which is used to take ore from pit-to-port.

“Any disruptions on the line can cost us a lot of money in financial penalties, so [an ML] model runs on a sliding window basis and is able to predict about seven weeks out if there needs to be maintenance done on a certain part of the track, which can cause delays,” Rosen said.

“With that information at hand, the planning team can then either schedule around the part of the track or fix that problem before it occurs.”

Other ML models are used in health and safety. For example, in Canada, a model predicts “the likelihood of a water leak occurring in an industrial smelter.”

“If water drips into an industrial smelter, it can cause a build-up of hydrogen gas which is extremely explosive,” he said, adding that it could lead to potential loss of life, damage to equipment and plant shutdown.

“This model runs on an hourly basis and produces a risk rating which is fed into the risk management process of the plant.”

A third use case for machine learning is in habitat management.

“As a large miner, we take our ecological responsibility of our mine sites extremely seriously,” Rosen said.

“We’ve been using machine learning to help recognise and manage the animal habitats around those sites. 

“[This] information is then fed into the planning process of the mine to help avoid disturbing those habitats.”

Ry Crozier attended AWS Summit Sydney as a guest of AWS.



Source link