DDN targets enterprise-shaped hole in its AI storage offer

DDN targets enterprise-shaped hole in its AI storage offer

The artificial intelligence (AI) boom has yet to really break and, when it does, enterprises will spend billions on it, according to storage array maker DDN, which has long-standing expertise in delivering storage to support massive compute clusters in high performance computing (HPC) and, more recently, AI. 

HPC compute clusters already dependent on DDN storage will likely triple or quadruple orders by 2027, said Paul Bloch, co-founder and president of DDN, adding: “200,000 GPUs are deployed in [Elon Musk’s] xAI cluster in Memphis, and that number will reach a million. The growth of performance of GPUs is increasingly driving data storage requirements, because the only way to deal with it is increasingly to ingest data in real time.”

During a recent meeting with sister publication LeMagIT as part of the IT Press Tour in Silicon Valley, DNN CTO Sven Oehme noted that the enterprise market itself is also promising. 

“AI hasn’t yet achieved its potential, but enterprises can see the likely evolution and what’s possible,” said Oehme. “The boom will really come when enterprises fully launch into inference and exploit AI in all sectors of activity.”

DDN is worth $5bn and will realise a turnover of $1bn in 2025. At the start of this year, the business secured $300m in funding from Blackstone in return for 6% of capital and a seat on the board.

According to Bloch, Blackstone’s support will help the company to find customers outside its traditional client base: “AI investment in the order of $1bn to $2bn will occur by companies whose names we don’t even know of yet.

“These decisions will be taken by general management or data management around potential savings around AI, in particular in industrial applications. In that context, Blackstone isn’t just an investor, but more like a partner to whom we open our contact book.”

A more robust EXAscaler for enterprise customers

To attain its objectives, DDN updated its core product line. After the EXAscaler AI400X2 Turbo was launched last year, it then announced AI400X3, which comes with 24 NVMe SSDs and support for additional SSD shelves via NVMe-overTCP.

Performance was improved by 55% in reads and more than 70% for writes, with throughput of 140GBps (read) and between 75GBps and 100GBps (write). That speed gain comes from use of four Nvidia BlueField cards that use that vendor’s SpectrumX protocol.

“We’ve already got a reference architecture for their new Blackwell GPU,” said Oehme. “Elsewhere, all testing and development for Blackwell is taking place with DDN equipment.”

Other improvements were in security and pooling. The latter addresses the needs of MSPs for cloud services that need to dynamically provision parts of the array for different and often numerous clients. 

From the point of view of a new customer base among enterprises, the EXAscaler AI400X3 brings data compression that offers useable capacity much greater than its raw capacity, as is already the case with suppliers such as Dell, HPE or NetApp. 

“And now we have considerably improved its resilience,” said Oehme. “We can now replace any hardware or software component without stopping the array. The risk of downtime is reduced to a minimum. That’s fundamental because enterprises can’t entertain any disruption. That differs from HPC customers who prefer, in case of incident, to diagnose the problem themselves before coming to us.”

Infinia 2.1: Object storage accelerated for AI

While the EXAscaler AI400 shares its contents in file mode, the direction of enterprise AI users is towards object storage. That’s less speedy in terms of access but offers capacity and options to label or filter contents whether for re-training or for private data in inference via RAG. It’s for that need that DDN launched its object storage product Infinia in 2024.

Infinia has now reached version 2.1, for which DDN claims speed multiplied by 100x. Here it competes with other object storage solutions that have also recently accelerated performance to support GPUs, such as HyperStore from Cloudian or Ring from Scality.

DDN contrasts its offer to AWS’s S3 Express, which allows enterprises to carry out AI tasks in its cloud. Infinia 2.1 will be 10x faster in terms of access time and 25x faster in terms of requests.

The DDN product has connectors to most AI software stacks, such as TensorFlow, Pytorch and Apache Spark, plus NIM modules from Nvidia’s AI Enterprise suite. All that was present in version 2.0. The 2.1 update added integrations with observability platforms Datadog and Chronosphere, and with those that rest on the open source OpenTelemetry standard. It can also now connect to Hadoop-based data lakes.

Infinia 2.1 comes on DDN hardware or can be used via virtual instance on Google Cloud Platform.

DDN also offers two other hardware products, xFusionAI and Inferno – xFusionAI is an EXAscaler AI400X3 that runs the Lustre file system as well as Infinia 2.1, so it can access the same data via S3 object storage, which brings the advantage of being able to access metadata that indexes contents.

Inferno provides a network proxy installed between the Infinia appliance and servers to accelerate communication between object storage and Lustre file storage. The device has a switch based on Nvidia BlueField, which uses its SpectrumX protocol and GPUdirect for direct communication with GPUs, plus NVMe SSD cache.

11,000 customers, 4,000 in AI

DDN said that it has deployed storage that supports 700,000 GPUs at 7,000 customers, of whom 4,000 are engaged in AI workloads. The company employs 1,100 people, of which 600 are engineers, with 200 to 250 more hires planned for this year.

At its origins in 1998, DDN supplied storage arrays for the few supercomputer centres then in existence. Its key know-how comes from feeding numerous compute nodes in parallel, and that’s what convinced Nvidia to partner with it since 2010.

“Imagine 10,000 GPUs,” said Bloch. “If you wanted to get 1 GBps on reads per GPU, you’d need throughput of 10TBps for the whole array. There are few systems in the world capable of providing 10TBps on reads and on writes. That’s what we have mastered.”


Source link