Geoscience Australia is set to modernise the storage infrastructure underpinning a six-petabyte pool of petroleum exploration data, moving from tape to cloud.
The agency said in a request for information that the data had been collected by “petroleum exploration and development titleholders and submitted to the government” as required by legislation.
“The collections are used by government, community, research, and industry organisations, including Geoscience Australia’s precompetitive work programs and [to] support the government’s offshore acreage releases,” it said.
The data is accessed either via an online system called NOPIMS – the national offshore petroleum information management system – or a client services team, when the files are simply too large for online access.
Data is presently stored within an IBM stack that the agency hopes to modernise via what it’s calling the repository data management and delivery (RDMD) project.
The stack includes an on-premises IBM TS3500 Automated Tape Library (ATL), IBM Storage Protect (previously Tivoli Storage Manager) “to manage the archiving/storing and retrieval of geophysical surveys and other data”, and IBM Storage Scale (GPFS) servers to provide high performance clustered file systems on Network Attached Storage (NAS) for caching and validating the data prior to storing/archiving in the ATL.”
“The repository also uses cloud object storage for some open-file data provisioned via NOPIMS,” Geoscience Australia added, noting later that this uses AWS S3.
The agency said it currently stores six petabytes of data in the environment but expects this volume to grow by around two petabytes a year.
“The environment is facing other challenges including infrastructure, processing, and data throughput constraints, technology obsolescence, increasing data volumes for validation, cataloguing, storing and retrieval,” it said.
The request for information is to “replace the current Automated Tape Library/Storage Protect environment” with a system where “final data storage and access should be in and from a cloud-based environment.”
Whatever the agency procures should also meet “storage and throughput needs for the next 10 to 15 years and beyond,” it said.