Image by Ajay kumar Singh from Pixabay

Vector-based (purely probabilistic) RAG is dead.

Deterministic-based RAG is very much alive and well. Ditch the vector DBs for query and retrieval.

— Michael Iantosca, Sr. Director of Content Platforms and Knowledge Engineering at Avalara, in a comment on LinkedIn

Benchmarking is hard, and there haven’t been many good benchmarks when it comes to foundational, transformative data layer technology.

One of the problems is that most benchmarkers aren’t asking the right questions. A really good question that isn’t asked often enough is, “Do solutions designers need to add ontologies to their graphs?”

This post considers the status of the main graphRAG solutions and reports on some of the research Graphwise has done to validate the assertion that ontologies are important to graphRAG.

Some background on multi-hop reasoning benchmarking

Effective tests of multi-hop reasoning capabilities have been elusive. If one of the primary value propositions of your solution is better integration across disparate datasets, the ability to reason across those datasets (with each additional dataset requiring an additional reasoning “hop”) is critical. 

The MuSiQue dataset, which has been available on Hugging Face for two years now, is a rare  benchmark that manages to sidestep a problem with multi-hop reasoning: The fact that testers can game the system and use only single-hop reasoning with a benchmark that claims to test multi-hop reasoning.    

Graphwise has recently been using MuSiQue to assess the value of an LLM-generated ontology in a multi-hop reasoning scenario, and claims 95 percent accuracy–if the exact context is available, and the GraphDB RDF graph database management system and SPARQL are used.  

Graphwise, 2026

Notable findings from the benchmarking research

In preparation for this post, I’ve been doing some research to educate myself on graph retrieval augmented generation (RAG) benchmarking, beginning with desk research on multi-hop reasoning benchmarking using MuSiQue. Also took a look at HippoRAG, a graphRAG system that can extract entities and relations to create a knowledge graph of sorts. 

But by far the most illuminating observations were those I heard from Aleksis Datseris, an AI Researcher at Graphwise. 

Datseris was asked to evaluate the performance of public graphRAG systems with and without ontologies. The research question: Would the addition of an ontology improve performance? (Answer: Yes, it should. But proving this definitively is a work in progress.)

During the course of his research, Datseris has discovered many related things worth noting:

Finding #1: GraphRAG system builders don’t often use ontologies, if they use them at all.

Most public graphRAG systems don’t use ontologies. This finding surprised me, but perhaps it shouldn’t have. Serious neurosymbolic AI–which blends deterministic knowledge representation and probabilistic machine learning–constitutes a fraction of overall AI activity.

Datseris did find a public graphRAG system he liked: HippoRAG. HippoRAG offers an associative indexing/memory capability of sorts, which makes it possible to extract entities and relationships.

In turn, that node-and-edge extraction capability helps the system connect the dots in multi-hop reasoning scenarios, so that, for instance, the system can more reliably find the name of the specific grandmother of William the Conqueror, if you’re looking for that.

Datseris and team figured out a way to add an ontology to HippoRAG. And after some evaluation, they discovered that adding an ontology did indeed improve performance.

Finding #2: Public graphRAG systems aren’t using a query language.

Instead of using a query language, these systems use an algorithm akin to Personal PageRank or do a brute force kind of search–going through every node and edge to be able to find and retrieve a result. 

When using vector search, these systems sometimes merge entities on the basis of vector similarity. I’m not dismissing vector search by pointing this out, but just observing that vector search is by definition similarity search – close but not precise on certain answers to questions. 

I think of data models, query languages and database management systems, whether relational or non-relational, to be tried and true ways to ensure accurate query and retrieval. It’s odd that so many graphRAG systems seem to ignore these tried-and-true methods. But I suppose this circumstance shouldn’t surprise me. After all, the bias is clear among graphRAG system designers toward probabilistic machine learning/neural net methods on their own, rather than a blended neurosymbolic AI approach that provides a deterministic alternative when users demand certainty, such as in transactional situations.

It’s like most of the AI engineering community has forgotten about symbolic methods that were dominant in IT systems for over 60 years.

Finding #3: Designers can make LLMs behave deterministically, but operating purely deterministically creates problems for them.

Datseris: “If you’re scanning an LLM from an app, it’s not going to be deterministic. That is what our internal evaluations have found. My explanation for this is that even bigger models can fall into infinite loops. Even if you request a temperature of zero, I’m pretty sure the response raises it internally at least a little bit to decrease the chance of an LLM ending up in an infinite loop.”

Finding #4: Adding an ontology to HippoRAG (for example) makes it possible to retrieve the same result every time

Datseris: “With an LLM and an ontology, even if you add as much data as you want, unless the data is too much for the LLM’s context, the LLM needs to write the same query based on the ontology, so you can retrieve the answer every time. So no matter how much your data grows, the LLM can find the answer.”

Finding #5: A natural language query of a semantic knowledge graph gives you a precise count, unlike a regular RAG.

Say you want to know how many genes affect the human body’s predisposition to diseases like Alzheimer’s. Datseris says LLMs will struggle to provide a good answer: “A regular RAG will return a fixed amount of results. So let’s say it returns 100 documents. A regular RAG will try to count how many of them are related to that gene. And it will probably fail because they’re not very good at counting.

“But with semantic graphRAG, a natural language question will be translated into a formal database query, and the query will return one row, and it will have the exact amount. So the LLM will not have to count it – it will be much faster and you get the exact number. And if somebody’s wondering if LLMs are still bad at counting, yes, they’re still not very good at it.”

Finding #6: LLMs are now learning how to fetch answers, instead of trying to remember them.

Datseris: “Instead of trying to remember its knowledge, we should try to give it the tools to reason and to find the relevant information. And this we are seeing in newer LLMs. They’re trained to be agentic agents. They’re trained to try to fail, to try again, to find what they need and then to give you the answer.

We have started moving from “Let’s have them know and think” to “Let’s try to make them able to use tools that we give them to find the correct answer, to find what the solution is.”

Finding #7: Companies can give a toolbox to tool-using LLMs.

“Now the LLM is an agentic tool that can query, it can find stuff and it can learn how to provide the answers based on the toolbox you give it. Which is obviously very exciting for everybody because the average person is, ‘I just want to find what the answer is.’”

But for a lot of companies, for a lot of private use cases, it’s, ‘I want to give you my toolbox, I want to give you the tools that I have developed and I want you to use them in a way that allows you to solve a task that is useful to me, despite the fact that this has not been in your training data.’ And for you to do that, you don’t need most of the things that you have learned during pre-training.

Conclusion: Give RAGs the context they need to use the whole toolbox.

Tried-and-true symbolic methods — data models, query languages, and database management systems — ensure accurate query and retrieval. Semantic graph DBMSes, ontologies and related metadata expand the reach of these methods, allowing desiloing and disambiguation at scale, even across supply chains.

By providing the context along with access to useful tools, a contextual GraphRAG can shift from probabilistic guessing to deterministic certainty.

For more information:

Aleksis Datseris, Andrey Tagarev, and Atanas Kiryakov, “From Retrieval to Reasoning: Enhancing HippoRAG with Graph-Based Semantics,” Graphwise (blog), February 11, 2026, https://graphwise.ai/blog/from-retrieval-to-reasoning-enhancing-hipporag-with-graph-based-semantics/.

Bernal Jiménez Gutiérrez et al., “HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models,” arXiv preprint arXiv:2405.14831, May 23, 2024, https://doi.org/10.48550/arXiv.2405.14831.

Harsh Trivedi et al., “MuSiQue: Multihop Questions via Single-hop Question Composition,” arXiv preprint arXiv:2108.00573, August 2, 2021, https://doi.org/10.48550/arXiv.2108.00573.

6 responses to “Contextual GraphRAG and Its Evolution”

  1. […] MuSiQue dataset is a clear step forward toward better GraphRAG benchmarking,” said Alan Morrison, Independent Graph Technology Analyst and author of The […]

  2. […] MuSiQue dataset is a clear step forward toward better GraphRAG benchmarking,” said Alan Morrison, Independent Graph Technology Analyst and author of The […]

  3. […] MuSiQue dataset is a clear step forward toward better GraphRAG benchmarking,” said Alan Morrison, Independent Graph Technology Analyst and author of The […]

  4. […] MuSiQue dataset is a clear step forward toward better GraphRAG benchmarking,” said Alan Morrison, Independent Graph Technology Analyst and author of The […]

  5. […] MuSiQue dataset is a clear step forward toward better GraphRAG benchmarking,” said Alan Morrison, Independent Graph Technology Analyst and author of The […]

  6. […] MuSiQue dataset is a clear step forward toward better GraphRAG benchmarking,” said Alan Morrison, Independent Graph Technology Analyst and author of The […]

Leave a Reply

Trending

Discover more from The GraphRAG Curator

Subscribe now to keep reading and get access to the full archive.

Continue reading