Cloud databases: Base jumping for the bigger picture


The way organisations store, manage and analyse data will always be a challenging issue given the constant assault of data on corporate IT systems. It’s as though IT teams are always playing catch-up.

According to Veritas, the average company stores around 10PB (petabytes) of data – equating to around 23 billion files – 52% of which is unclassified (or dark) data and 33% of which is redundant, obsolete and trivial. While this inevitably impacts storage costs and cyber security (that’s a whole other story), analysing and deriving insight from this data is not easy. It demands a different approach to how data is traditionally managed, as more and more organisations work with increasingly complex data relationships. 

Generative artificial intelligence (GenAI) is undoubtedly becoming an increasing consideration, especially when it comes to corporate thinking around data management. But it’s something of a double-edged sword at the moment. The upsides – often headline-catching benefits – are influencing board members. Acording to Capgemini research, 96% of executives cite GenAI as a hot topic of discussion in the boardroom. But when it comes to practical realities, there is still some uncertainty. 

As Couchbase’s seventh annual survey of global IT leaders reveals, businesses are struggling with data architectures that fail to manage the demands of data. The research claims that this struggle amounts to an average of $4m in wasted spending. Some 42% of respondents blame this on a reliance on legacy technology that cannot meet digital requirements, while 36% cite problems accessing or managing the required data.

What is clear is that relational databases can’t move quickly enough to support the demands of modern, data-intensive applications – and businesses are suffering as a result.

Managing structured and unstructured datasets has led to different approaches. For example, graph databases – a type of NoSQL database – are increasingly seen as essential to the modern mix of databases that organisations need to address their data needs. Interestingly, Couchbase’s survey findings show that 31% of enterprises have consolidated database architectures so applications cannot access multiple versions of data, and that only 25% of enterprises have a high-performance database that can manage unstructured data at high speed. 

NoSQL databases in action

So, who is using graph and other NoSQL databases, and why? Can a multi-database approach help, or does it just mean more complexity to manage? According to Rohan Whitehead, data specialist at the Institute of Analytics (IoA), a professional body for analytics and data science professionals, the primary reasons for adopting graph databases are their efficiency in handling highly interconnected data and their ability to perform complex queries with low latency.

“They provide a natural and intuitive way to model real-world networks, making them ideal for use cases where understanding the relationships between data points is crucial,” he says.

Examples of prominent users include social networks, such as Facebook, which want to analyse relationships through social graphs. Financial services providers also use graph databases for fraud detection, mapping transaction patterns to uncover anomalies that could indicate fraudulent activities. And supply chain companies use graph databases to optimise logistics by analysing the relationships between suppliers, products and routes. 

“NoSQL databases are widely adopted across industries such as e-commerce, IoT [internet of things] and real-time analytics,” says Whitehead. “E-commerce giants like Amazon and eBay use document-oriented databases like MongoDB for managing product catalogues, enabling quick and flexible updates without the need for complex schema implications.” 

He adds that IoT applications, such as those in smart cities or industrial automation, benefit from the “scalability and flexibility of key-value stores like Redis, which can handle the high velocity of data generated by sensors. In real-time analytics, companies use column-family stores like Cassandra to process and analyse large volumes of streaming data, enabling quick decision-making and insights.”

Scalability and flexibility

While graph databases are efficient in their handling of interconnected data, performing low-latency queries, NoSQL can scale horizontally, handle unstructured data and work well in distributed environments. The key here is the ability to manage different data models and support various workloads. 

“Today, many teams use graphs because they are a flexible and performant option for many modern data systems,” says Jim Webber, chief scientist at Neo4j. “Graphs suit many domains because highly associative (i.e. graph) data is prevalent in many business domains. Graphs are now a general-purpose technology in much the same way as relational databases, and most problems can be easily reasoned out as graphs.” 

As an example, he points to one of Neo4j’s large banking customers that wants to “know its risk profile by transitively querying a complex network of holdings”. According to Webber, the organisation had repeatedly started and abandoned the project, having tried to get it to work using relational tables. In another example, Webber says Transport for London uses graphs to act faster in repairing and maintaining London’s road networks, “saving the city around £600m a year”. 

Another Neo4j customer is ExpectAI, a London-based consultancy that is using graph database technology for climate change solutions. According to CEO and founder Anand Verma, graph technology has enabled the company to “navigate a vast ecosystem of public and private data, whilst providing the traceability and context needed to reduce pessimism around perceived greenwashing”.

Verma adds that the flexibility of graph databases has given the business what it needs to effectively capture complex relationships in its data. “This in turn provides the powerful information and insights our customers require to take profitable actions whilst reducing their carbon footprints,” he says.

But it is the AI bit of the company’s name that is really adding value to the offering. Verma suggests AI is helping the technology to organise unstructured data, which in turn is enabling semantic search and vector indexing. 

“This is helping users to interpret their data through an NLP [natural language processing] conversational Q&A [questions and answers] interface,” says Verma. “Our end goal with this technology is to significantly contribute towards 500 megatons in carbon emissions reduction across the world by 2030.” 

It’s a worthy aim and a good example of how graph technology is transforming data relationships and enabling new, complex data business ideas to flourish. The use of AI will invariably increase as organisations look to reduce manual functions, drive time query times and increase insights.

AI and NoSQL

The IoA’s Whitehead says graph databases are “particularly well-suited for AI applications that require understanding and analysing relationships within data”. He adds that the technology can support advanced algorithms for pattern recognition, community detection and pathfinding, which are crucial for tasks such as recommendation systems, fraud detection and knowledge graphs.  

For Ken LaPorte, manager of Bloomberg’s data infrastructure engineering group, AI has already had a significant impact, but with NoSQL, the business has seen a lot of interest internally in “making use of Apache AGE, the graph database extension, together with PostgreSQL”.

“It has been in use for everything from data lineage (tracing data as it moves through systems) to intricate deployment dashboards. The analytical power of Apache AGE combined with Bloomberg’s rich datasets has been a natural success story for us.”

AI is therefore proving invaluable as the business wrestles with the ever-increasing volume of structured and unstructured information needed to make informed decisions. 

“As we’re seeing an exponential increase in financial information across all asset classes, Bloomberg is continuing to invest in a number of different technologies to ensure we can execute on our comprehensive AI strategy,” adds LaPorte. “Graph and vector databases are key parts of that effort, in addition to vector search components built into other data technologies. This spans traditional sparse search to more AI-driven dense vector (or semantic) searches.” 

NoSQL databases, with their ability to handle large volumes of data, are integral to AI applications. They support real-time data ingestion and querying, essential for AI applications requiring immediate data processing and decision-making, such as predictive maintenance and real-time analytics.  

At Bloomberg, for instance, real-time data analysis capabilities of graph databases support AI applications that demand instantaneous insights, such as dynamic pricing and anomaly detection. 

“The flexible data models of NoSQL databases allow for the storage and processing of complex and varied data types, which is advantageous for AI applications that need to handle unstructured data like text, images and sensor data,” says IoA’s Whitehead. As an example, he says: “MongoDB’s document-oriented model facilitates the storage and retrieval of JSON-based data, which is commonly used in AI workflows.” 

Database future direction

Whitehead suggests that the future of graph databases “looks promising”, with expected growth in adoption as more organisations recognise the value of analysing interconnected data. “Industries such as healthcare, telecommunications and finance will increasingly rely on graph databases for their analytical capabilities,” he says, adding that future developments will likely focus on enhancing graph analytics and deeper integration with AI technologies. 

Expect to see cloud providers expanding their database offerings, touting more robust, scalable and integrated solutions. Graph and other NoSQL databases are “poised for significant growth and innovation”, says Whitehead.

He’s not alone in this thinking. The consensus is that the capabilities will match the growing vision of industry, with the integration of AI enabling more intelligent and data-driven applications. 

Bloomberg’s LaPorte has some advice: “Everyone needs to experiment. You need to think of a use case. You can rely on products like DataStax AstraDB, OpenAI, etc, to create a production-ready solution in no time and measure its value immediately. Then, if the direction looks good enough, you can invest more resources to optimise the use case.” 



Source link