Key-value flash targets more efficient data storage


Imagine flash storage that stores data in exactly the format used by applications. That’s what’s promised by key-value flash media, as envisaged by at least one big drive maker, some researchers and startups, and in the NVMe Key-Value command set.

Productised manifestations of key-value storage are more than thin on the ground, however. There has been research around the subject, and we’ve seen the development of a command set for key-value in NVMe, as mentioned.

Meanwhile, in 2019, Samsung announced a prototype key-value drive, which got built into a spin-out called Stellus with plans for a key-value storage array, but this seems to have been consigned to the where-are-they-now file. More recently, startup QiStor floated plans to productise storage software and FPGA chips for key-value storage, and claims it can be a big market opportunity.

The idea is that by retaining data in key-value format, it’s made massively more efficient, speedy, energy-efficient and durable compared with the existing multi-layered input/output (I/O) process.

Currently, applications and hosts must translate storage I/O into logical block addressing (LBA) at drive level so that data can be located during read/write operations. That’s a data addressing method that goes back to the origins of spinning disk hard drives and their LBA modus operandi. And so, developers of key-value stores noticed inefficiencies in I/O that resulted.

In other words, right now, if an application interacts with a key-value database, it must talk to that database then translate key-value addressing via the host file system and into LBA-speak to find the physical location of data on disk. That process contains a heap of steps that could be removed to make it more efficient.

But also, logical block addressing brings other inefficiencies that key-value storage can remove.

Flash storage suffers from poor durability as a result of erases and rewrites. Such activity is amplified because every time data is overwritten – to blocks a different size to LBA blocks – it has to be erased, moved and written elsewhere.

That need becomes more pressing as the device fills up so a single write to disk may result in multiple writes (known as garbage collection) as data is moved around. All of which creates wear, and shortens drive lifespan.

By contrast, key-value storage allows the application to talk directly to media with no need for translation through the OS, the file system and to media LBA.

That’s because key-value storage doesn’t need to know what physical block to look for to find what it wants. Instead, key-value storage manages placement of data and knows where values reside. The host, OS and file system don’t take part in the process. If a value is sought, the device looks through its internal mapping tables to find where its key is held.

Key-value is a widespread and strongly emerging way of storing data found, for example, in the JSON format, in the etcd datastore in Kubernetes, as a data type in JavaScript and Python programming languages, and is the basis of NoSQL databases.

In key-value, the key is the name of a variable and the value is its value, or values. Keys and values can be of any length and of different data types – eg, numeric, character, even images or sound files – and can also be nested, so a key may have a value that is another key, with its own values.



Source link