“While everything in computing gets smaller and smaller, the 21st century computer is something that scales from a smartwatch all the way up to the hyperscale datacentre,” says Michael Kagan, CTO of Nvidia.
“The datacentre is the computer and Nvidia is building the architecture for the datacentre. We are building pretty much everything needed, from silicon and frameworks all the way up to tuning applications for efficient execution on this 21st century machinery.”
Based in the Haifa district of Israel, Kagan joined Nvidia three years ago as the company’s CTO through the acquisition of Mellanox Technologies. Jensen Huang, Nvidia’s founder and CEO, told Kagan that he would oversee the architecture of all systems.
Beyond Moore’s Law
The well-known Moore’s Law comes from a paper Gordon Moore wrote in 1965, called Cramming more components onto integrated circuits. In the paper, Moore, who went on to become the CEO of Intel, predicted that technology and economics would conspire to allow the semiconductor industry to squeeze twice as many transistors into the same amount of space every year. He said this would go on for the next 10 years.
This prophecy, which became known as Moore’s Law, was modified 10 years later. In 1975, Moore said that the doubling would occur approximately every two years, instead of every year. He also said it would continue for the foreseeable future. In fact, chip manufacturers benefitted from this doubling up until around 2005, when they could no longer count on economics and the laws of physics to squeeze twice as many transistors into the same amount of space every two years. There just wasn’t any more room left between transistors.
Since then, chip manufacturers figured out other ways to increase computing power. One way was to increase the number of cores. Another way was to improve communications between multiple chips and between processors and memory by connecting the different components more directly to one another using a network, instead of a shared bus, which was prone to bottlenecks.
Semiconductor manufacturers also went further up the stack to invent new ways to deliver computing power. They looked at the algorithms, accelerators, and the way data was being processed. Accelerators are specialised components – usually chips – that perform specific tasks very quickly. When a system encounters such a task, it hands it off to the accelerator, thereby achieving gains in overall performance.
Manufacturers looked specifically at artificial intelligence (AI), where data is processed in a fundamentally new way, as compared to the von Neumann architecture the computer industry was so used to.
“AI is based on neural networks,” explains Kagan. “That requires a very different kind of data processing than a von Neumann architecture, which is a serial machine that executes an instruction, looks at the result, and then decides what to do next.
“The neural network model of data processing was inspired by studies of the human brain. You feed the neural network data and it learns. It works similarly to showing a three-year-old kid dogs and cats. Eventually the kid learns to distinguish between the two. Thanks to neural networks, we can now solve problems that we didn’t know how to solve on the von Neumann machine.”
AI and other new applications, such as digital twins, accelerated the need for computing performance and brought on the requirement for a new paradigm. In the past, software development required very little computing power, but running the resulting program required much more. By contrast, AI requires a huge amount of compute to train neural networks, but much less to run neural networks.
A single GPU or CPU is not enough to train a large AI model. For example, ChatGPT takes about 10,000 GPUs to train. All the GPUs work together in parallel, and of course, they need to communicate. In addition to massive parallel processing, the new paradigm requires a new kind of specialised chip, called the data processing unit (DPU).
Huang’s Law
“The fastest machine in the world in 2003 was the Earth-Simulator, which performed in teraflops,” says Kagan. “The fastest computer today is Frontier, which performs in exaflops, a million times more. In 20 years, we’ve gone from teraflops to exaflops.”
He adds: “During the 20 years between 1983 and 2003, compute performance increased a thousandfold and in the next 20 years the performance of computing increased millionfold. That phenomenon is what some have called ‘Huang’s Law.’ Our CEO, Jensen Huang, observed that GPU-accelerated computing doubles its performance two times every other year.
“As a matter of fact, it is going even faster than two times every other year. Now we are talking about AI workloads and a new way of processing data. If you look at how much faster you can run an application on Nvidia Hopper versus Ampere, which is our current generation GPU versus previous generation, it’s more than 20 times.”
Kagan says, what makes computing faster now is mainly the algorithms and the accelerators: “With every new generation of GPUs, more accelerators – and better accelerators – are added, processing data in much more sophisticated ways.
“It’s all about how you partition functions between different parts. You now have three computer elements – GPU, CPU, and DPU – and a network connecting them, which also computes. At Mellanox, the company bought by Nvidia, we introduced in-network computing where you can make the data calculations as data flows through the network.”
Moore’s Law counted on the number of transistors to double computing performance every two years; Huang’s Law counts on GPU-accelerated computing to double system performance every other year. But now, even Huang’s Law may not be able to keep up with the growing demand from AI applications, which need 10 times more computing power every year.