QiStor flashes key-value storage software for hardware acceleration

August 21, 2024 4 min read

Existing flash storage is power-hungry and wasteful of datacentre performance because its data allocation method is rooted in the distant past of spinning disk hard drives and their fixed block pattern. The solution is to write data in key-value format and cut out the block addressing middleman.

That’s the view of Silicon Valley startup QiStor, which plans to go to market with storage software on a custom chip that accelerates key-value format reads and writes.

Key-value is a widespread way of writing data. In key-value, the key is the name of a variable and the value is – as the name suggests – its value. Keys and values can be of any length within certain limits. They can be different data types, such as numeric, character, even images or other objects. They can also be nested, so a key may have a value that is another key, with related values.

Key-value is widespread as a data format, being found in the JSON format, in the etcd datastore in Kubernetes, as a data type in Javascript and Python programming languages, among others, and is the basis of NoSQL databases.

What QiStor addresses is an emerging trend in which data storage is written and read directly in key-value format.

A big driver for this is that the existing way in which file systems and databases talk to storage hardware are often inefficient. In existing systems data is usually allocated to 512kb blocks on media. File systems have to translate between that physical layer and data as seen by the user and application. That brings a processing overhead.

In addition, with flash storage, it is the case that as data is written, optimised, moved etc on the media, it has to be erased and made ready for re-use. That also – ie, garbage collection – creates inefficiencies as data is erased and re-written.

Andy Tomlin, CEO of QiStor, said: “What we have already works, but with lots of inefficiencies, and these equate to waste in capacity, performance and power. What’s the optimum solution? Allocation and tracking of space should happen in one place, at the lowest level, and on the best-performing hardware.

“So, we think the solution is key-value. It provides an abstraction that’s a user-defined reference. It’s not the device defining it. There are other ways, but key-value is the most simple abstraction for the information we want to store. In most databases the bottom layer is key-value.”

The backstory QiStor is keen to highlight is the growing issue of powering datacentres that is exacerbated by the increase in use of AI.

That backdrop includes that datacentres use 1% to 2% of global electricity and data storage consumes 20% to 25% of that. Meanwhile, according to the World Economic Forum, the compute power needed to sustain AI growth doubles roughly every 100 days.

Focusing down from this, Tomlin said the key demand is increasingly likely to be for datacentre hardware to offload processing from the CPU. The use of GPUs as hardware acceleration typifies this, but there are also DPUs, network acceleration etc in use.

A second factor is use of vector databases for AI, for which Tomlin said key-value datastores are a key underpinning.

QiStor claims a 10x to 100x acceleration which it has gained by means such as optimising reads/writes, and reducing or eliminating garbage collection. But QiStor is still currently a year away from being able to offer a product, said Tomlin, adding: “We’ve built a lot of the core technology, and now will build the storage engine.”

Its plans are to develop its software and offer it via FPGA acceleration chips, either via third party cloud-based services or that customers can spec into their own infrastructure.

“It’ll be customers buying, for example, a cloud database service with key-value hardware acceleration, or they’ll buy their own stack and provide key-value acceleration,” said Tomlin.

What QiStor will offer is distinct from flash drives with key-value instead of block addressing, as in the NVMe KV standard.

Talking of prospective customer use cases, Tomlin said: “It is interesting to customers with large datasets, so not those that would fit in memory. We’re talking 1PB or more, but no less than 100TB at the smallest. In terms of workloads we’re talking web applications, analytics, AI and anything where there’s a requirement for performance and large amounts of storage.”

Tomlin added: “If a customer cares about how many servers it is running key-value on, we’re probably interesting. Some customers run 10s or 1,000s of servers running key-value…whole departments devoted to it. The database market is $100bn and a big chunk is in key-value.

Source link