New AI benchmarks test speed of running AI applications – Hardware

New AI benchmarks test speed of running AI applications - Hardware

Artificial intelligence group MLCommons unveiled two new benchmarks that it said can help determine how quickly top-of-the-line hardware and software can run AI applications.



Since the launch of OpenAI’s ChatGPT over two years ago, chip companies have begun to shift their focus to making hardware that can efficiently run the code that allows millions of people to use AI tools.

As the underlying models must respond to many more queries to power AI applications such as chatbots and search engines, MLCommons developed two new versions of its MLPerf benchmarks to gauge speed.

One of the new benchmarks is based on Meta’s so-called Llama 3.1 405-billion-parameter AI model, and the test targets general question answering, math and code generation.

The new format tests a system’s ability to process large queries and synthesize data from multiple sources.

Nvidia submitted several of its chips for the benchmark, and so did system builders such as Dell Technologies.

There were no Advanced Micro Devices submissions for the large 405-billion-parameter benchmark, according to data provided by MLCommons.

For the new test, Nvidia’s latest generation of artificial intelligence servers – called Grace Blackwell, which have 72 Nvidia graphics processing units (GPUs) inside – was 2.8 to 3.4 times faster than the previous generation, even when only using eight GPUs in the newer server to create a direct comparison to the older model, the company said at a briefing.

Nvidia has been working to speed up the connections of chips inside its servers, which is important in AI work where a chatbot runs on multiple chips at once.

The second benchmark is also based on an open-source AI model built by Meta and the test aims to more closely simulate the performance expectations set by consumer AI applications such as ChatGPT.

The goal is to tighten the response time for the benchmark and make it close to an instant response.



Source link