Malcolm Sparks: Graph-Centric Apps Can Rid Enterprises of Software Bloat

I recently caught up with Malcolm Sparks to discuss his vision for the future of application development. As someone who’s spent decades in the trenches of enterprise tech, Malcolm offers a perspective that’s radical and refreshing. He’s tired of the over-engineered complexity accumulated at the application layer since the early 2000s. Instead, he’s pushing for a return to simplicity, but with a ]data-centric twist.

Malcolm’s thesis is simple: we don’t need the bloated application code that currently plagues enterprise systems. By embracing Tim Berners-Lee’s original vision, he treats every resource as a unique, referenceable HTTP ID. As Malcolm puts it, “You really don’t need the traditional application code anymore. If you handle access control directly in the data and use declarative SPARQL updates and queries, there’s not much code left to write! I’m determined to prove that we can build sophisticated applications using nothing but HTML, CSS, and SPARQL.”

He’s building tools for sophisticated use cases in investment banking (for example) that store logic right inside a semantic graph database like Apache Jena. When you treat SPARQL as a file format and combine it with reactive technologies like DataStar, you effectively merge your database and your web server. Most of the application code actually lives inside the semantic graph database, where it becomes reusable and transparent. It turns the entire application into a live, reactive dashboard without the need for massive JavaScript frameworks.

The semantic technologies in use allow for more magic to happen when you introduce AI agents. Today’s agents need reliable ground truth, not just LLM hallucinations. Malcolm’s architecture gives them that. Because the system tracks every change through a SPARQL update pipeline, you get perfect provenance.

As Malcolm noted, “With a SPARQL update pipeline, you have a perfect audit trail showing exactly who made a change and when. We attach provenance to everything, which creates a safe system where you can discover exactly what your AI agents have done.”

By making data the center of the architecture, he simplifies access control—treating data like classified files—and enables agents to act as trusted, audit-ready coworkers.

Malcolm, in fact, is presenting on this access control innovation at Semantic Arts’ Data-Centric Architecture Forum (DCAF) this week. I’m happy to share this preview of his thinking with the following embedded YouTube recording and edited transcript.

YouTube Video

Edited Transcript

[00:06:50]

Alan: This is another issue of the GraphRAG Curator podcast, and I’m really pleased to have Malcolm Sparks with us. So, Malcolm, could you tell us a bit about your current affiliations?

Malcolm: Well, I’m a software developer, and I’ve been sitting at the intersection of two different communities for a while now. The first is the Clojure community, which I’ve been heavily involved with for about 15 years [Clojure is a functional, general-purpose programming language and a dialect of LISP that runs on the Java Virtual Machine, emphasizing immutability and concurrency]. But I’m also quite active in the W3C camp.

[00:08:17]

Malcolm: I guess you could call me a spectator in the W3C community, though I’m still an avid implementer of web protocols. Back in 2010, I built a knowledge graph application for a bank, and I really enjoyed the experience. I felt RDF [RDF (Resource Description Framework): A standard model for representing data on the web, designed to facilitate data exchange and integration.] had a great resonance with many ideas in the Clojure community, which is why I believe Clojure is such a strong language for implementing web technology.

[00:09:34]

Alan: A lot of us in the semantics community have really latched onto functional programming, haven’t we? I’d love to establish how you got your start and what’s informed you the most along the way.

Malcolm: It’s been a long road, for sure. Seeing the early version of Netscape Navigator was truly informative; watching it connect PCs and printers across the world so seamlessly really blew my mind.

[00:11:15]

Malcolm: In the 90s, I was building CORBA [CORBA (Common Object Request Broker Architecture): A standard that allows software components written in different languages and running on different computers to communicate.] applications, and I was even envisioning this “intergalactic object web.” Then Java arrived in 1995, and I got pretty involved, especially with Java RMI [Java RMI (Remote Method Invocation): A Java API that allows an object to invoke a method on an object running in another Java Virtual Machine.] applications. Back then, I was a total object-oriented fanboy, building everything in C++ and Java.

[00:12:30]

Malcolm: I actually started a Java users group in Manchester, and a speaker there told me all about Enterprise JavaBeans (EJB), which was a huge thing for Sun. But the problem with objects is mutable state—you have to reconcile that state in the object with the state in the database, which is tricky. [In programming, ensuring that memory and the database always show the exact same information is complex and can be error-prone – a reconciliation challenge.] EJB was an early attempt to synchronize object state to a database, and I ended up writing one of the first implementations before moving on to write a server engine for J2EE.

[00:14:01]

Malcolm: That led me to JavaServer Pages and the whole XML/SOAP era, but eventually, it all became just too complex. So, I turned my attention to REST [REST (Representational State Transfer): An architectural style for providing standards between computer systems on the web, making it easier for systems to communicate.] and simple servlets instead. [Instead of a heavy container managing an object’s lifecycle, the servlet acts as a direct, simple handler that receives an HTTP request and returns a response,]

[00:15:10]

Malcolm: I came across WebMachine, which Justin Sheehy built on Erlang [another functional, declarative language] to use a state machine for the request-response cycle. I built a Clojure version called Plugboard, and later, I discovered Composure REST—which did a much better job—so I renamed it Liberator. After that, I built a router called Biddy, followed by the web framework Yada, and a whole set of technologies I called Site.

[00:16:31]

Alan: That’s quite a journey. How did we get from there to your current graph-centric work?

Malcolm: Well, I remember hearing DHH—the creator of Ruby on Rails—speak about the “zenith of developer experience” in the early 2000s. Back then, all you really needed was HTML, images, CSS, and an Apache web server; it was easy, fun, and the web never seemed to go down.

[00:18:11]

Malcolm: We’ve definitely made things increasingly complex since then, but I’m trying to return to that simplicity. I want to marry the Apache web server with a database, and graph-centric work extends that model by using SPARQL [SPARQL: A semantic query language for databases that allows users to retrieve and manipulate data stored in RDF format.] as a file format.

[00:20:34]

Malcolm: Nathan Marz effectively reinvented the database by arguing that it’s just a place to store records and query them efficiently. That marries perfectly with SPARQL, which already has a very mature query and update language.

[00:22:55]

[By the way, Clojure (created in 2005) has been popular with Malcolm and other backend, large-scale system innovators like Nathan Marz (who leads Red Planet Labs, behind the rebuilding of Mastodon so it could scale in the same way Twitter does) because it balances functional programming with the power of the JVM ecosystem.

Clojure runs on the Java Virtual Machine, allowing Malcolm to easily integrate with established Java-based technologies like Apache Jena, a graph triple store that’s been supported for decades now. This gives him access to enterprise-grade tools while using a modern, functional language.

A defining feature for Malcolm is the REPL (Read-Eval-Print Loop). He notes that it allows him to “play around” with technology by typing into a prompt and seeing immediate results. This is crucial for his work with RDF, where he needs to develop data models on the fly [source: 4, 16:46].

Influenced by Rich Hickey (the creator of Clojure) and Datomic, Malcolm treats data as immutable. For a knowledge engineer, this is vital because it allows for creating audit trails by treating every change as a new transaction in a queue.]

Alan: You mentioned Rich Hickey (creator of Clojure) and Datomic [a database Hickey created in 2010] earlier. Is that an immutable data store?

Malcolm: It is. I’m currently using Apache Jena [A free and open-source Java framework for building Semantic Web and Linked Data applications., a popular RDF triple store that’s pretty easy to work with if you’re a Java or Clojure developer.]

[00:24:19]

Malcolm: My inspiration for immutability definitely comes from Datomic. You send transactions into a single write path, which essentially creates a queue. And because read replicas can be added as needed—provided you don’t mind being a few seconds out of date—every change acts as a new transaction. This allows you to reconstruct the state of the database at any point in history.

[00:26:55]

Malcolm: I’ve been wanting to solve this since 2008. While I was working at an investment bank, I struggled with thousands of different systems, and I just wanted trade IDs to be URLs. That way, I could put them in a browser, see the authorization, and just follow links to the counterparties.

Tim Berners-Lee actually wrote about this 20 years ago; he said [quotable quote here] you should give everything an ID—specifically HTTP IDs—so they’re referenceable. When you put them in a browser, you can deliver content and provide links to discover even more content. It really aligns with what AI agents need today: ground truth, research capabilities, and the ability to use tools effectively.

[00:32:19]

Alan: Does that structure allow for granular access control?

Malcolm: It’s a key problem, and one I worked on solving five years ago. The issue in most enterprises is that everyone has different applications—HR, accounting, you name it—and access control is always siloed within those specific apps.

If you want to merge data into a single core, you’re going to need general access control. In graph-centric development, we treat data a bit like the Secret Service treats files: classified, unclassified, and top secret. We just use named graphs to group the data.

By generating SPARQL queries that filter for authorized named graphs, we create a kind of “jail.” You can query anything within that authorized subset, but nothing outside of it.

It’s a simple approach, and frankly, it requires way less code than trying to retrofit access control onto a bunch of existing applications.

[00:41:37]

Alan: So, how is an application actually simpler to develop when you take this graph-centric approach?

Malcolm: [Another quotable quote here:] You really don’t need the traditional application code anymore. If you handle access control directly in the data and use declarative SPARQL updates and queries, there’s not much code left to write! I’m determined to prove that we can build sophisticated applications using nothing but HTML, CSS, and SPARQL.

[00:42:45]

Malcolm: We also use a technology called DataStar to run continuous queries, so if the data changes, the HTML generation is automatically re-triggered.

In a graph-centric world, everything becomes a dashboard because of DataStar. It’s incredibly reactive, which allows us to build dynamic interfaces without relying on those complex JavaScript frameworks.

[00:48:05]

Alan: Can you walk us through an example of such an application?

Malcolm: We’re building a solution for governance, regulatory, and compliance. Companies often have to provide massive amounts of evidence for audits like SOC 2 or ISO 27001, which creates a huge paper trail. So, we map policy statements to security frameworks, define the assets, and then generate action items and dashboards automatically.

AI agents can even act as digital coworkers to help meet those requirements and provide the evidence. Even though it’s a large enterprise application, the entire codebase is just HTML, CSS, and SPARQL.

Malcolm: We’re using the same concepts you’d find in DevOps tools like Terraform to detect configuration drift and redeploy. It feels like a static site generator, but you’re working with a live database.

[00:54:13]

Alan: After working at an audit firm for 20 years, I can tell you that 80% automation feels impossible without this kind of simplicity. The visibility into provenance and audit trails for AI agents seems absolutely essential.

Malcolm: Exactly. [Final quotable quote here.] With a SPARQL update pipeline, you have a perfect audit trail showing exactly who made a change and when. We attach provenance to everything, which creates a safe system where you can discover exactly what your AI agents have done.

We’re currently looking for partners who share our vision of data-centricity. The linked data community has been lacking killer demos, but if we can show people ERP systems that work this way, they’ll finally understand the value.

[1:00:20]

Alan: Are you presenting at the Data-Centric Architecture Forum (DCAF) this year?

Malcolm: I am. I’ll be speaking about access control and comparing different industry approaches. It’s happening June 9-11.

Alan: We’ll be sure to get this interview out by then. Thank you, Malcolm.

Malcolm: Thank you, Alan.

Leave a ReplyCancel reply

Recent Posts