Inspired by science: Graph RAG for principled computing

One of the main reasons I like Graphwise’s approach to solutions is that it has been principled and inclusive from the start, going back to its predecessor companies Ontotext and Semantic Web Company in the 2000s. Those two companies shared the same values articulated by open web pioneer and scientist Tim Berners-Lee and the standards body he founded, the World Wide Web Consortium (W3C).

With its Graph Retrieval Augmented Generation (Graph RAG), Graphwise continues to take full advantage of the progress that’s been made in the open source community, particularly when it comes to semantic web and related hybrid AI standards in knowledge graphs. That’s not to mention the principles embodied in those standards.

Graph RAG marries the symbolic AI of knowledge representation with probabilistic methods empowering language models, whether large (LLMs like ChatGPT) or small (SLMs such as Gemma 3, a mobile-friendly SLM, released by Google in March 2025).

A good knowledge graph in Graph RAG grounds language model retrieval in facts and rules specifically relevant to a given business context. By design, such a graph also clears a path for a scalable, transparent architecture that de-silos all kinds of heterogeneous data.

By default, the built-in transparency Graphwise Graph RAG enables a new level of governance visibility. A standards-based semantic layer in a knowledge graph platform and scalable, articulated metadata management capability make such a built-in capability possible.

You’ve heard the saying, “You can’t manage what you can’t measure.” It follows that you can’t measure what you can’t even find, access, use, or connect to and interoperate with as a part of a larger system.

Most enterprises don’t have visibility into their structured or less structured data assets. Graphwise Graph RAG changes radically how AI ready data management happens, a hybrid AI approach that adheres to key management principles.

With a Graphwise Graph RAG approach, companies can substantially lower their risk of AI project failure and keep the overall AI risk–including agentic AI–lower over time. This is because the bad data and absence of good data risks are key elements of overall AI risk. Semantic graph RAGs, as I’ll discuss later, make data and knowledge management much more feasible and scalable.

As a result, organizations can get their arms around their governance goals and keep an eye out for riskier content from third parties, for example, that may include synthetic data. Publishers may be licensing content that needs additional scrutiny. In June, 2025, CIO Magazine reported on wholly generated authors and authors’ portraits that ended up in a sports publication due to licensed content that hadn’t been vetted.

Digitization without duplication

All organizations need to keep in mind that digitization isn’t just conversion to bits, an operating system, some siloed applications, each with their own database, networking, and bolt-on security.

Instead, true digitization implies the end-to-end, multidimensional, dynamic visibility of the full virtual environment. Such visibility is a precondition for effective governance and the risk management approach mandated for financial institutions in the Digital Operational Resilience Act (DORA), for instance. {See “The Advantages of GraphRAG for Enhanced Regulatory Compliance and Understanding” Graphwise publication link TBD} for more information.

Now that knowledge graphs have become essential for trusted AI, shared visibility and the emerging ability to transparently agent manage the full system, while simultaneously enabling, empowering and prioritizing humans in the loop, needs to become the default, not the exception.

Digitization without duplication in the 2020s can clear a path to building safe, interoperable digital twins from the ground up.Partners throughout supply chains can tap and seamlessly share those same twins, as long as they adhere to the same principles.

FAIR principles and measurement

At the heart of this modern, built-in semantic graph data governance capability therefore is data infrastructure support for FAIR principles. The essence of FAIR is semantic graph-enabled, unified data visibility, data that are

Findable
Accessible
Interoperable
Reusable

Once organizations embrace the FAIR principles and adopt the semantic knowledge graph-based architecture that makes it possible for organizations to contextualize their digital environments, the ability to measure more accurately throughout the digitized environment becomes feasible. The result is far less guesswork, accelerated discovery and informed business decision making.

Scientists in pharmaceuticals and the life sciences in 2016 named and first articulated the FAIR principles. In 2024, they established the FAIR²Alliance and released a draft specification for AI-ready data. (See https://www.fair2.ai/ for more information.)

Scientific exploration and hybrid AI open data management

The scientific community is at the forefront of large-scale, resilient semantic graph data management because the community thrives on data collection, observation, analysis and sharing. Consequently, the community intuitively understands the need for the precise and scalable management techniques inherent in semantic knowledge graphs.

Scientists must carefully and continually build on each others’ efforts to be successful in their projects. Together, they have to create, manage and update a shared, global understanding of the objects of their study at the molecular level and above in the process.

Because precision, persistence and scaling are all essential, machines and humans in the loop together must be integral components of that continual process. In that sense, scientific progress is a cybernetic, global, collaborative effort.

I had the chance recently to listen to Slava Tykhonov’s presentation to the Dataworthy Collective on the Dataverse, Knowledge Graphs and Open Weights Large Language Models.

The Dataverse Project, an open source project that can be traced back to 2006 on Github, began at Harvard’s Institute for Quantitative Social Sciences (IQSS), and can be traced even earlier to 1997, the initial year of a predecessor project called the Virtual Data Center (VDC).

Vyacheslav (Slava) Tykhonov has been a hands-on engineer and researcher at DANS, the Dutch national centre for expertise and research data, since 2016. He has worked with Harvard’s IQSS, the host of the Project, for a number of years. He is Harvard’s Dataverse Ambassador.

The Dataverse Project’s GraphRAG

Consider for a moment how academic research data sharing has evolved since the 1990s. The original VDC at Harvard was a relatively passive, centralized repository, just a way to collect, store and offer keyword-based search and retrieval. By contrast, the Dataverse Project today uses its decentralized, federated GraphRAG architecture to marry LLMs and semantic knowledge graphs.

The current system is much more capable, providing all sorts of different means of querying, interacting with and otherwise adding to, monitoring, using and managing the information. Today, the Project’s federated graph is used by scientists from all sorts of disciplines, collaborating together over 118 nodes on six continents.

Tykhonov and others on the Project have harnessed the power of so many open standards in the Project’s GraphRAG infrastructure. For example, the W3C’s Decentralized ID (DID) specification makes it possible to time-bound concepts in snapshots, so that if a given concept’s definition drifts over time, the evolving definition is accurately captured in a sequence of individually IDed concept snapshots.

The infrastructure also utilizes Croissant ML, an open specification which provides the LLM side of the system means of dataset discoverability via the system’s vector store’s embeddings. So the system uses both a vector store and an RDF triple store.

Sidestepping monolithic AI: Graph Retrieval Augmented Generation’s potential

Whether the Dataverse Project’s GraphRAG for public scientific exploration, or Graphwise’s own business-oriented Graph RAG, a semantic standards-based Graph RAG of the type I’ve discussed in this post makes it possible for enterprises, public or private, to contextualize data using a knowledge graph approach. They’re both inspired by scientific data exploration and open standards developed and refined over decades.

The resulting graphs then become a foundation for hybrid AI systems that use an LLMs natural language interface ability. But they’re designed to retrieve factual information nurtured with the help of semantic graph databases and systematic, machine-assisted, graph metadata management.

Those methods together provide a path to principled, visible, desiloed, governed computing of a kind that most enterprises aren’t even aware of yet.

Leave a ReplyCancel reply

Recent Posts

Malcolm Sparks: Graph-Centric Apps Can Rid Enterprises of Software Bloat

Cheryl Dunn: Unlocking Agent-Based Accounting with a Semantic Approach

Jack Kausch: Semantic RAG

Categories

Trending

Malcolm Sparks: Graph-Centric Apps Can Rid Enterprises of Software Bloat

Cheryl Dunn: Unlocking Agent-Based Accounting with a Semantic Approach

Jack Kausch: Semantic RAG

Unlocking Intelligent AI with a Unified Data Foundation