The University of Turku in Finland is one of 10 university research labs across Europe to collaborate in building brand new large language models in a variety of European languages. The group chose to train algorithms on the LUMI supercomputer, the fastest computer in Europe – and the third-fastest in the world.
LUMI, which stands for Large Unified Modern Infrastructure, is powered by AMD central processing units (CPUs) and graphics processing units (GPUs). The University of Turku contacted AMD for help in porting essential software to LUMI. CSC joined in, because LUMI is hosted at the CSC datacentre in Kajaani, Finland.
“Now AMD, CSC and the University of Turku are collaborating in using LUMI to train GPT-like language models on a large scale, using large data sets,” said Aleksi Kallio, manager for artificial intelligence (AI) and data analytics at CSC. The project involves Finnish, along with several other European languages.
Large language models are becoming standard components in systems that offer users a dialogue-based interface. They allow people to communicate through text and speech. The primary users of a large language model are companies, which adopt the technology and quickly find themselves reliant on organisations such as OpenAI. Governments are also interested in using large language models, and they are even more wary of growing dependent on other organisations – especially foreign ones. But as much as companies and governments would love to develop their own models in their own environments, it’s just too much to tackle.
Developing a large language model takes a lot of computing power. To start with, the models are huge – using tens to hundreds of billions of interdependent parameters. Solving for all the variables requires a lot of tuning and a lot of data. Then there are non-technical issues. As is the case with any emerging fundamental technology, new questions are being raised about the impact it will have on geopolitics and industrial policies. Who controls the models? How are they trained? Who controls the data used to train them?
“Once large language models are deployed, they are black boxes, virtually impossible to figure out,” said Kallio. “That’s why it’s important to have as much visibility as possible while the models are being built. And for that reason, Finland needs its own large language model trained in Finland. To keep things balanced and democratic, it’s important that we do not depend on just a few companies to develop the model. We need it to be a collective effort.
“Currently, the only way to train a language algorithm is to have a lot of data – pretty much the whole internet – and then tremendous computing power to train a large model with all that data,” he said. “How to make these models more data-effective is a hot topic in research. But for now, there is no getting around the fact that you need a lot of training data, which is challenging for small languages like Finnish.”
The need for a large amount of available text in a given language, along with the need for supercomputing resources to train large language models, make it very difficult for most countries in the world to become self-sufficient with respect to this emerging technology.
The increasing demands for computing power
The powerful supercomputer and the cooperation among different players make Finland a natural starting place for the open development of large language models for more languages.
“LUMI uses AMD MI250X GPUs, which are a good fit for machine learning for AI applications,” said Kallio. “Not only are they powerful, but they also have a lot of memory, which is what’s required. Deep learning of these neural networks involves a lot of fairly simple calculations on very large matrices.”
But LUMI also uses other types of processing units – CPUs and specialised chips. To pass data and commands among the different components, the system also needs exceptionally fast networks. “The idea is that you have this rich environment of different computing capabilities along with different storage capabilities,” said Kallio. “Then you have the fast interconnect so you can easily move data around and always use the most appropriate units for a given task.”
A few years ago, machine learning research could be done with a single GPU unit in a personal desktop computer. That was enough to create credible results. But modern algorithms are so sophisticated that they require thousands of GPUs working together for weeks – even months – to train them. Moreover, training is not the only phase that requires extraordinary computing power. While training an algorithm requires much more computing than using the algorithm, current large language models still need large servers for the usage phase.
The current state-of-the art models are based on hundreds of billions of parameters, which no computer could have handled just a few years ago. There is no end in sight to the escalation – as researchers develop new algorithms, more sophisticated computing is required to train them. What’s needed is progress in the algorithms themselves, so the models can be trained on regular servers and used on mobile devices.
“On the bright side, there are tonnes of startups coming up with new ideas, and it is possible that some of those will fly,” said Kallio. “Don’t forget that today we’re doing scientific computing on graphics processing units that were developed for video games. 15 years ago, nobody would have guessed that’s where we’d be today. Looking into the future, who knows what we will be doing with machine learning 15 years from now.”