There is a healthy relationship between large language models (LLMs) and graph databases, which are used to draw in information across different networks of data, according to Jim Webber, chief scientist at Neo4j.
Computer Weekly spoke to Webber following the ratification of the GQL ISO standard, which provides a standard way to run searches across graph databases and is analogous to the SQL-86 ISO standard for relational database management systems.
Graph databases take a very different approach to data queries than relational databases. Webber has worked with graph databases for around 16 years, 14 of which have been with Neo4j. While he sees a role for relational databases, Webber’s main point is that the runtime performance in a relational database deteriorates.
The basic idea behind a relational database system is it organises data in a row-orientated fashion and links data assets together using “joins” to link rows in one database table with rows in another to form a relationship between the two. A simple example is a row that identifies a unique customer reference in one table linking to the customer’s contact details held in another table.
“Ironically, relational databases are terrible at joins,” he says. “This is the one thing you don’t want to do in a relational database because you’re doing it at runtime in the expensive part of the system,” says Webber.
This, he points out, is because joins are effectively run in memory and occur when an application or a user runs a query that requires interrogating multiple database tables.
However, despite this apparent inefficiency, relational databases are the core data platform for many enterprise applications.
“Graph networks enable you to model [messy data] in a high-fidelity way without suffering the pain and complexity of having to build complex tables and schemas and do joins at runtime”
Jim Webber, Neo4j
“In the olden days, it mostly made sense to use relational databases, because all the data was identical,” he adds. Webber is referring to the fact that something like a payroll system holds thousands of instances of identically formatted data for thousands of employees.
He says: “The world that we lived in in the 1980s was uniform, and the world that we lived in in the 1990s was mostly uniform, so it made perfect sense to use a relational database.”
But with the explosion of systems that has occurred more recently, Webber says data has become messier. “Graph networks enable you to model that mess in a high-fidelity way without suffering the kind of ‘join bomb’ pain and the complexity of having to build complex tables and schemas and do joins at runtime,” he says.
Confidence in GQL
Webber believes the newly ratified ISO standard for graph query language (GQL) represents a significant inflection point for the technology. The ISO standard for structured query language (SQL), called SQL-86, was published in 1986.
Recalling the significance of the standard, Webber says he was “programming ZX Spectrum at that point” so “SQL meant nothing to me” back then. But the SQL-86 standard settled a debate that began in the 1970s on how to manage databases. Edgar Cobb, while working at IBM developed the relational database model. The SQL standard – adopted by the American National Standards Institute (ANSI) in 1986 and the International Organization for Standardization (ISO) in 1987 – gave application developers and enterprise software buyers the confidence to use relational databases.
An alternative proposal, the network database, developed by Charles Bachman lost out. But Bachman’s approach, according to Webber, is an ancient precursor to graph databases.
He believes standards are important when IT decision-makers have to make technology bets. “CIOs are nervous because if you make a significant investment in a system, you don’t want to be locked in or find that the system has no future and you’ve backed the wrong horse. It’s the VHS versus Betamax analogy. I think SQL gave a significant injection into the application software market because it told everyone that relational database technology is mature and safe.”
According to Webber, the ISO GQL standard, like SQL-86, protects IT buyers from making poor commercial decisions. Even though there are different dialects of SQL, the basic syntax remains the same. The same should hold true for GQL.
“In principle, you can always switch vendors because your language is going to stay the same,” he says. “The additional learning I have to do to specialise in a relational database management system like SQL Server or Oracle is marginal.”
AI common sense
Analyst Gartner recently put knowledge graphs at the centre of its impact radar for generative artificial intelligence (GenAI).
“Graph databases are knowledge graphs, a network of facts, which offer a most appropriate counterbalance for GenAI,” says Webber. If GenAI is like the creative right side of the brain, he feels that graphs are like the left side, which is more focused on reasoning.
Jim Webber, Neo4j
“You’ve got a probabilistic engine in generative AI. I love it. I know it’s a robot, but it still feels just so dangerously, wonderfully close to having that spark of imagination,” he says. “But this spark needs to be tempered, and it turns out that knowledge graphs are particularly good at this, using an approach called graph RAG.” This is where the graph database provides contextual information for LLMs.
“This is probably the best way we know to get the best out of generative AI, while stopping falsehoods and misleading things from leaking to the end user. It’s the left brain working with the right brain.”
During the discussion, Webber talks about an example referred to during a National Public Radio (NPR) broadcast where an AI researcher asked an LLM how long it would take to dry two shirts on her washing line, if it takes three hours to dry one. The answer is obviously three, but an LLM may reason that two shirts would take twice as long.
“You can stop those falsehoods from leaking through by putting that vector’s map into knowledge graphs, which can be traversed,” says Webber.
For example, he says that when presented with the word “apple”, the user may want the AI system to understand that the apple in question is the company in Cupertino that makes iPods and iPhones. With the graphs, he says: “You can traverse a rich network of facts around Apple, the tech company. It is not apple, the fruit, nor Apple, the Beatles record label.” In effect, the graph database applies a level of common sense to the LLM, which, based on context, helps to direct its responses to the answers that make more sense.
“Where you’ve got a network of facts to exploit, no other data model gives you that network of facts,” Webber claims. “Nowadays, the way you would exploit that network of facts is by writing the query code in SQL.”
But given the inefficiencies Webber speaks about in using SQL to perform joins across multiple data sources, running GQL on a knowledge graph may be how AI learns common sense going forward.