NASA’s Marshall Space Flight Center has begun work with IBM to use artificial intelligence (AI) technology to discover new insights in its Earth and geospatial science data. The joint effort will apply AI foundation model technology to NASA’s Earth-observing satellite data for the first time.
These observations are used by scientists to study and monitor Earth. NASA said the data is being collected unprecedented rates and volume, hence it needs new approaches to extract knowledge from these growing data resources. The pair plan to develop an easier way for researchers to analyse and draw insights from the large datasets, based on IBM’s foundation model technology.
According to IBM, foundation models are types of AI models that are trained on a broad set of unlabelled data, can be used for different tasks, and can apply information about one situation to another. These models have rapidly advanced the field of natural language processing (NLP) technology over the past five years, and IBM is pioneering applications of foundation models beyond language.
“The beauty of foundation models is they can potentially be used for many downstream applications,” said Rahul Ramachandran, senior research scientist at NASA’s Marshall Space Flight Center in Huntsville, Alabama. “Building these foundation models cannot be tackled by small teams. You need teams across different organisations to bring their different perspectives, resources and skillsets.”
IBM and NASA plan to develop several new technologies to extract insights from Earth observations. One project will train an IBM geospatial intelligence foundation model on NASA’s Harmonized Landset-Sentinel-2 (HLS) dataset, a record of land cover and land use changes captured by Earth-orbiting satellites.
By analysing petabytes of satellite data to identify changes in the geographic footprint of phenomena such as natural disasters, cyclical crop yields and wildlife habitats, IBM and NASA hope this foundation model technology will help researchers provide critical analysis of our planet’s environmental systems.
NASA will also be working with IBM to make Earth science literature easily searchable. IBM has developed a natural language processing (NLP) model trained on nearly 300,000 Earth science journal articles to organise the literature and make it easier to discover new knowledge. According to IBM, this represents the largest AI workload trained on Red Hat’s OpenShift software. The fully trained model uses PrimeQA, IBM’s open source multilingual question-answering system. Beyond providing a resource to researchers, the new language model for Earth science will be built into NASA’s scientific data management and stewardship processes.
Another potential IBM-NASA joint project is the construction of a foundation model for weather and climate prediction using MERRA2, a dataset of atmospheric observations. This collaboration is part of NASA’s Open Source Science Initiative, a commitment to building an inclusive, transparent and collaborative open science community over the next decade.