I’m pleased today to post a recent interview with Timothy Cook, co-founder of Axius SDC and one of the main minds behind the Semantic Data Charter.

I’ve known Tim since 2017, when we were both members of the Estes Park Group, a brainstorming committee of sorts that Dave McComb of Semantic Arts put together. We named the group for the place–Estes Park, Colorado–where the group first convened.

Today in 2026, Tim and his colleagues at Axius are tackling one of the most expensive headaches in enterprise data: the billions spent on data migration and the constant struggle with brittle APIs. Their solution is a multi-level modeling approach, a concept Tim’s been building on since 1999, that lets your data live for decades without ever being migrated. 

During the interview, we dig into how this approach is boosts data management efficiency and consistency, especially in sectors like healthcare and boundary-crossing supply chain activities like logistics.

Axius SDC is a new Graphwise partner. You can check out the work of he and his colleagues at the Axius SDC and Semantic Data Charter website, and connect with Tim on LinkedIn. You can find the embedded YouTube Video of our conversation and an edited transcript directly below.

Interview Recording

Edited Transcript

Alan Morrison: 

Hey everybody, it’s Alan Morrison with another episode of the Graph Rag Curator podcast, and I am so pleased to have Timothy Cook of Axius SDC with us today.

Tim and I first got to know each other at the Estes Park Group retreat back in 2017. That was something that Dave McComb of Semantic Arts set up, and the Estes Park Group was kind of a brainstorming committee. We had a loose relationship to the Semantic Arts in-person events. It was wonderful to get acquainted with Tim and get his perspective. 

Tim, do you want to talk about your background a little bit for our audience and just tell us where you’re coming from with Axius?

Tim Cook: 

Well, the impetus for what is now the Semantic Data Charter, it started in 1999 when I started trying to tackle semantic interoperability in healthcare. And so that led a long winding path. You can read the story on the website about how we got to where we are today, but yeah, so we ended up with a multi-level modeling approach with a reference model and then constraining that reference model to build data models.

Alan Morrison: 

Yeah. And when you say multi-level modeling, I’m envisioning different layers of abstraction that you’re building, right? And, yeah, so maybe a healthcare example would help sort of tease this out a little bit.

Tim Cook: 

Sure. Well, something that a lot of people in healthcare may be more familiar with would be the openEHR approach where you have a core reference model and you restrict it with archetypes there.

What we did is we wanted to move away from the archetype definition language, which is a domain-specific language. And so we chose XML Schema because 1.1 was in the draft stages at that point and it had a lot of features that we really needed to be able to build composable models. And so that’s kind of how we got to the multi-level part. Then we added, of course, an ontology underneath for the Semantic Data Charter that actually describes the reference model itself.

Alan Morrison: 

And I’m assuming over time you’ve been exploring all sorts of newer technologies, and your website talks about agents for example, and immutability and that kind of thing. I mean, there’s a lot of powerful technologies you seem to have been harnessing, right?

Tim Cook: 

Well, one of the things that we’ve seen with most standards—and it shows up in FHIR and NIEM, which are two of the most notable, I guess, XML type standards—FHIR being an API and basically NIEM (a controlled vocabulary) is an API as well. 

But the way they define things is that they use the actual semantics to define the structure of their XML schemas. In fact, openEHR does the same thing in their data models. They’ll use the actual names from the reference model of components, say like an observation, they actually call it an observation.

Instead, our reference model is based, we call them extended data types, because we take basic data types like string, but for all the data types we add an access control tag that comes from a controlled vocabulary. So every data element can carry its own access control, say to protect PII or whatever. 

We also carry temporal and spatial information for every data element. So you have very, very fine-grain control in your data models over who has access to certain data, and that is carried with it. 

The other thing we wanted to be able to do was be able to share these models easily. So the XML schema namespacing approach really helps with that. And because you can embed RDF/XML into an XML schema, then you can have your full semantics and structural integrity all in one file basically. So we used CUIDs to do the structural naming and then we use a label inside those to actually add the semantic name. So the CUID gives us a component that is reusable and they’re composable into more complex schemas. Does that make sense?

Alan Morrison: 

Yeah, it sure does. And it leads us to the heart of the discussion that we’re going to have, which is where things are in terms of healthcare. And when you talk about electronic health records, EHRs, think of companies like Epic for example, and how incumbents like that have pretty much dominated things for decades. You’re trying to make use of EHRs when they’re oriented toward a particular lens on things, a fee-for-service lens as it started out for example. And so you’re starting from a difficult space with some business pain that is just acutely felt by providers and others in the healthcare system, right?

Tim Cook: 

We didn’t actually plan this, but one of the really unique benefits of the Semantic Data Charter approach is you don’t have to rip and replace anything, whether you’re using NoSQL databases, SQL databases, or you’re just keeping stuff in a spreadsheet. You can design data models to exchange with your information trading partners and exchange that model with them. 

That model validates against all the specifications—reference model, ontologies are all open source. So each data model has to validate against the reference model, and then all of your data knows which schema data model schema that it is attached to.

So within the ecosystem, you can validate. If you’re only exchanging one type of data with a trading partner, they only need that one data model. And you can have as many data models as you need and they’re all composable. 

So if you have the same set of data items in one schema that you need in another, you can just reuse them and they all carry the same IDs. So then when you put them into something like a graph database—like we’re partners with GraphWise now—if you put them into a GraphDB as RDF, then all of those can be combined as you need to analyze across them.

Alan Morrison: 

Let’s paint the picture of the migration issue you mentioned before we started recording here. How much budget is spent on migration for those who aren’t well-versed in the problems of the industry. What’s going on there? And is that a major problem you’re tackling with this approach?

Tim Cook: 

Yeah. And again, it wasn’t the core reason we designed it this way. It just works out because of XML schema namespacing. Data that you store today can live for decades into the future in the same data store, whether it’s a graph database or a NoSQL or XML database like MarkLogic or something. So you never have to migrate. 

But because everything is tracked—every instance of data is tracked with its schema, which also has a unique ID forever; these are collision-resistant UU IDs basically—you can always go back and prune that data store or move it off to an archive at any point because the data is tracked throughout. The data model, the reference model, also has complete governance and provenance tracking built into it. You can design your own governance and provenance models, but there are slots in the reference model for you to plug those into.

Alan Morrison: 

And so it sounds like there’s a lot of capability here that can be tapped into. But more of what I was getting at is…. Let’s state the problem from the provider perspective for example, and the migration budget that they have. What are they spending it on? And what’s the size of the budget, you know?

Tim Cook: 

Well, oh gosh, it’s billions. I forget the Gartner. There was a recent Gartner report, well, recent in the last year or so, I believe it was like $3 trillion spent. And maybe that includes some of the… some of the worst part of it is the lost opportunities because you lose the continuity of that data. Every time it’s migrated, you lose some of the context of where it was originally, or you just can’t reuse it at all, which corrupts some of your other data down the line. 

You lose history for audit trail purposes. Longitudinal health records, any kind of thing like that where you need to keep years or decades worth of data available for audit is lost. 

And even if we have to change the reference model—we’re really pretty sure of the reference model at this point, we’ve been working on it since 2009, it went through a number of academic labs, so it’s pretty robust at this point—but even if we do and have to publish Semantic Data Charter V, then all that data can still live alongside each other.

Alan Morrison: 

It just seems to me as a patient that the whole healthcare delivery mechanism is full of third parties, full of partnerships, and so the records get balkanized, and there’s no consistency to how records are managed because of all the different parties that are involved in the whole record keeping and record collection process. So, does this method require the whole ecosystem to adopt a certain set of techniques?

Tim Cook: 

Well, I mean eventually you would want everyone to adopt the Semantic Data Charter. But let’s say everybody is doing this anytime, and it’s not just in healthcare. This is domain agnostic, right?

Alan Morrison: 

Okay.

Tim Cook: 

Right. And we have examples across other domains. But if you need to exchange data with another organization or even between departments in an organization, you define what that data exchange needs to look like and you build a data model for it. And now you can exchange that data back and forth and validate it each direction. So if there’s one data model…

Alan Morrison: 

Yeah, if there’s a sharing situation that emerges, then it seems like the best practice from your point of view would be to use the Charter to begin with, and there are certain steps that you would take to allow this sharing, right?

Tim Cook: 

So, the other thing that you do is you reduce API brittleness. Large APIs like FHIR and NIEM, they’re very brittle. So every time there’s a new version, then everybody has to upgrade or you’re not compliant, or you have duplicate APIs. With a Semantic Data Charter API, you need to be able to exchange one file, the data model, and then your instance data. So basically, you have two points on an API instead of hundreds.

Alan Morrison: 

Right. So you’re simplifying the process of the API scenario. There’s a lot of insightful value propositions that you’re talking about here. Can we, is there an example you can give of how this works in practice? Don’t necessarily have to name companies, but just a hypothetical company.

Tim Cook: 

Sure. Let’s say a logistics company needs to exchange bills of lading with a large retailer instead of using X12 EDI. They have data models, and they can exchange those. We have on the Semantic Data Charter repo there are some examples in there that really go into the details a lot. I mean it does get, they’re not simple XML schemas. There’s no magic here. 

They are pretty complex and that’s why we needed to be able to build the tools for domain experts to be able to use these. SDC Studio is a tool that a domain expert with a little bit of data modeling knowledge and knowledge of their domain can build their own semantically rich data models. 

And you can generate two applications. One is completely open source, uses Jena Fuseki as the RDF store. And the other one is what we classify as an enterprise-ready application and it uses GraphDB and Keycloak for login management and SirixDB as a time travel XML database.

Alan Morrison: 

And that all sounds like it’s all cohesively combined in your platform. Yeah. Those capabilities.

Tim Cook: 

Yeah.

Alan Morrison: 

That’s impressive. Can we talk about the domain exprts? Why don’t we try to stand in the shoes of a domain expert, a particular one that comes to mind, and what their challenge is, and how they’re solving their problem with the platform?

Tim Cook: 

Okay. Well, first you have two ways of getting started. I guess another key thing that we should mention is that we started with the idea that we’re going to do this for research scientists, which tons and tons of them still track their data in spreadsheets. 

And so we allow the upload of a CSV file into SDC Studio, and we use AI-enhanced analysis to figure out a good draft impression of what the CSV file means. 

[Demo begins here]

Then, if you’d like to see just the UI briefly about what a draft component would look like. Let me see if I can bring this up and then share it…. we’re right now populating SDC Studio with NIEM and the NIH Common Data Elements that are published. So let’s…

Alan Morrison: 

I’m seeing your screen here.

Alan Morrison: 

Okay. So this is from NIEM, and it’s a just a string definition of a rest location and a draft, and it’s in draft mode. You notice the AI has found RDFS is defined by schema.org location. 

So that’s pretty generic. But if you needed to add more information to it, like constraints on the strings, where it should appear in sequence within a cluster on the data entry form, whether the UI hard validates it or not. 

As I said, the access control tag, if this is personal data, sensitive data of any kind, you can enter a tag here. In the data model definition itself, it allows you to link to a specific control vocabulary. So like if you have an internal one in your organization, you can link to your own vocabulary or you can link to the data privacy vocabulary. That’s kind of a combination recently of several others.

Alan Morrison: 

Can we unpack the access control capability because I’ve been in client sites where access control is such a headache. The issue seems to be that a lot of the access control is just at the application layer and you’ve got access control at the data layer. Can you make that distinction and the value that a data layer approach takes?

Tim Cook: 

Sure. It lives with the data based on the data model that you create. So if the data modeler comes in here and says we’re going to require an access control tag and it’s just going to, let’s say it’s just PII, then that tag lives with that data element in every instance. 

So any software that knows, “okay, I got to filter those out,” it’s right there in that data instance. And you can also do it at the cluster level. Cluster is an aggregation level that is reiterative… you can use multiple clusters inside each other. So you can build arbitrarily large trees for your data.

Alan Morrison: 

Yeah. And is there a benefit that emerges from this approach that could help people inside the organization? I’ve been a part of consulting firms over the years and trying to get access to the right things to do your work, what you’ve been hired to do, is often a byzantine process in a large organization regardless of what vertical you’re in. Does this approach address that problem?

Tim Cook: 

No, that’s an organizational problem. I mean, this may help because the organization would know what data is tagged with what.

Alan Morrison: 

Yeah.

Tim Cook: 

So whether they give you access to it or don’t, if they’re using a controlled vocabulary like good practice would dictate, right? Like the data privacy controls….

Alan Morrison: 

Well, there’s a whole architecture suggested here that you’re thinking of. And what are some of the components of the architecture that are most important to think about and how that architecture differs from what’s in place in most locations?

Tim Cook: 

Well, the most important thing is every data element is well defined. And again, your temporal and spatial options are there. You know, like for this string, then say if we wanted to say not just location but we wanted to add address. 

And these are just ones that we’ve defined, these are predicate objects that we’ve defined in the system. But if you need to add a new one, you go look up some name in your vocabulary, say UMLS or bioontology.org, or, you know, anywhere else. Then you choose—we’ve got most of the standard predicates from SKOS and RDFS and OWL in here you choose from—and then you add your URI in there.

We look at this as a draft data model. Like I said, we’ve just uploaded these templates. We haven’t gone through and published everything yet. There’s a two-step process where you create the draft objects. Then once you’re certain that you have all your constraints and everything for your data elements, then you publish them and that publication freezes it with that CUID forever. 

So if you need a new one, you can go and make a copy of it and make your changes and now you actually create a new one, and it can still have the same semantic label, that’s a title or label. So like you can see that this basic demographics model has these clusters. And if we go, we go back to a cluster… [Tim goes back to the shared screen demo here] ….

This is one of the clusters. You can see here all the types that are included in that cluster: this one temporal, charge filing date, charge status, charge description, and this charge information cluster that came out of NIEM. And this was all automatically generated, including the semantic link here. But just like on each data element, you can go and edit it and make your changes to it. Does that explain the process?

Alan Morrison: 

It does. And I’m wondering, okay, when you got started with this, and maybe it’s where you’re ending up, was a research scientist a persona that you started with? Because you’ve got all this spatial temporal capability, you’ve got all this consistency, all this immutability. I mean it’s like those who are doing the research need to protect their research. They need to have a system where they can just have it consistent over many years longitudinally for all sorts of analysis that they’re doing.

Tim Cook: 

Right. Yeah. And those elements that are there capability-wise, they’re not required in every data model. If you don’t need those for your organization, that’s fine. Just don’t use them. I think that having access control and spatial temporal data is useful, but if you don’t need them, you don’t have to include them in the data model. They’re not required.

Alan Morrison: 

Well, you can’t do the really serious important things without it. 

That’s what I really like about what you’re doing: You’re tackling these huge problems and you’re giving people a means to start fresh with something that addresses these big problems in critical ways.

Tim Cook: 

Yeah. And you can address it just in pieces. You don’t have to rip out everything you’re doing. You just add on these data models where you’re exchanging data. Yeah, you know, it started as a persistence model. That was the idea, but it just happens that it also simplifies APIs and helps with a lot of other things too.

Alan Morrison: 

You’re addressing all sorts of things in the flow that get in the way usually.

Tim Cook: 

We’ve worked on it for a number of years.

Alan Morrison: 

That is very impressive. And so when you’re talking to business people and trying to tell them about the value of this approach, and I look at your website and it uses “agentic” as a term. The business people are thinking they have to adopt AI, they have to convert to an agent-based development model, etc. When they open the conversation that way, what are you telling them? What does Axius do for them in that circumstance?

Tim Cook: 

Yeah, it’s a difficult story to tell. I hope the website tells the story. First, we assume that they already know the pain of brittle APIs and data migrations and those kinds of things. So, that’s where we start. 

I don’t usually get into actually building the models for most people right away. The idea is that you can upload a markdown template that describes your data. And there are several on the Semantic Data Charter on our repo. 

On our repo on GitHub, there are several repos here that have examples. In fact, all the NIEM and NIH CDE templates are up there. And those were created with Claude code. Even the markdown templates were created by them, by specifying what I wanted to build using example templates that I had originally hand-coded, and by telling Claude what I wanted to build and then say, “Okay, now take these example XML schemas from NIEM and NIH and build me markdown templates.” And then I upload them into SDC Studio and we get those components. Right, exactly. So the CSV part is really a simplified thing, you know, but that is where we started.

Alan Morrison: 

People are just so used to working with spreadsheets, and so you’re accommodating that tendency.

Tim Cook: 

Yeah. And SQL extracts, you know, most of the time those end up as a CSV extract.

Alan Morrison: 

Exactly. So, you’re working with what people have and what they’re used to, right?

What is the the agent context? You and I know that agents have been around for decades. And it’s not a new thing. That would be one of the first things I talk to an executive about. You know, don’t get excited about this. We’ve wanted to do this for decades, but there are some big challenges. And what would you say in that context to an executive?

Tim Cook: 

Well, from the point of using SDC Studio, you don’t even need to know that that’s going on. If you upload a CSV file, that’s exactly what happens. There’s an agent dispatch for every column to determine its data type, possible semantic information for it. So there’s an agent dispatch for all of those. Then it’s compressed into one cluster and one data model for all these components within your project. And then as those components are built, another set of agents are dispatched again to find specific semantic links based on your preferences. 

Like user preferences. You can go down and select, and you can also upload ontologies. If you’re using a local ontology for just your business, you can upload that into SDC Studio and use that as your selected ontologies. Of course, we have, including Gist, and we have several of the standard ones with access to them. And then you can weight the preferred ontology, standard ontology, or external weights like from the LLM. So you can weight how the agents will choose which semantic links to build.

Alan Morrison: 

Oh, that’s so cool. So a lot of this agent-based approach is under the covers, along with RDF being under the covers, OWL being under the covers, controlled vocabularies. It’s not like the research scientists are being exposed to this when they’re working with the tool.

Alan Morrison: 

Right. Like I said, the original goal once we got to this point was to allow domain experts, whatever domain they’re in, to take their data and build high quality data models.

Alan Morrison: 

Without having to understand data modeling, right? Exactly. And there’s such a built-in fear about the data layer and modeling and metadata. 

Tim Cook: 

Well, it’s complex if you’re soaked in it every day.

A lot of work over the years has gone into this. Most of these people were grad students, either masters or PhD students, and they were working on their research projects. And so through working with them, modeling their data, whether they were physicians or computer science or whatever they were doing, we worked hands-on with them and helped them model their data. So that’s how we learned what needed to be done to hide all that complexity away from them.

Alan Morrison: 

Can you name some of the universities that you’ve worked with?

Tim Cook: 

Mostly Brazilian universities.I lived in Brazil for most of 11 years.

Alan Morrison: 

Yeah. So you still got a place in Brazil. Is that true?

Tim Cook: 

No. I was going to retire. I moved back here and bought a place back home. You know, and was going to retire. And then when the hologenetic coding thing started to blow up, I said, you know, I could really build SDC Studio now.

Alan Morrison: 

Tell us about the people you’re working with, the people who are core to your team in this project. What kind of background do they have? What role do they play?

Tim Cook: 

My longest partner is Dr. Nikki Shaw. She is a health informatics researcher, and she works mostly with the social aspect of healthcare settings and technology. That’s where most of her research has been: The introduction of computers 25 years ago, 30 years ago, how that affected the client, the patient-doctor relationship, stuff like that. So, she’s very much a qualitative expert and social sciences, but with health informatics expertise. 

Dr. Luciana Cavalini in Brazil is a public health physician. She ran a research institute in public health. She’s an epidemiologist, and now she’s a practicing physician and psychiatrist. A lot of the students that we put through were in Brazil.

Alan Morrison: 

Wow. So, do you have customers in Brazil as well?

Tim Cook: 

No. We’ve just decided to stand this up as a company.

So, this is early days, sitting there on GitHub for years. So, we’ve just recently incorporated right before the holidays.

Alan Morrison: 

So a lot of capability here that’s untapped. You need some attention paid to what you guys have done and where it could be applicable, right? So how do people get started with this from a corporate perspective?

Tim Cook: 

Well, they can contact us on our website at axius-sdc.com.

Alan Morrison: 

And you’ve got a button here that says “generate your first blueprint.” What’s a blueprint in that sense?

Tim Cook: 

Well, the data blueprint is a data model. 

Alan Morrison: 

So, you’ve got a visualization. Do you have some visualizations you might be able to show of a blueprint?

Tim Cook: 

No, I don’t, other than what you just saw with the clusters. And if you go to a cluster itself—of course, again, that’ll take a couple minutes to load—but the cluster itself has that same kind of visualization that shows you what’s inside the cluster. I kind of showed you that a little bit. But I don’t really have any visualizations. We have that, and if you go to edit the cluster, you have to publish the cluster and then it’ll display all of. So, you have all the available components, either in your project or ones that are public for each type of component, and then you just add it to the cluster. So, that’s the composability of it. You know, you can reuse and compose new clusters out of individual data elements. So like you build one social security number string with formatting in it and it’s reused everywhere.

Alan Morrison: 

Cool. You mentioned logistics, and I’m curious about a use case scenario for logistics. You sort of touched on it lightly. Can you go a little deeper on that one?

Tim Cook: 

Sure. Well, are you familiar with X12 EDI stuff?

Alan Morrison: 

It’s been years.

Tim Cook: 

It’s been years since I’ve really touched it, too. But it piqued my interest and Claude helped me explore some of the, like, purchase order and billing and stuff. And I was really shocked that the industry really hasn’t moved beyond that. So, I think that’s an industry ripe for the taking here. So that’s something we’ll be exploring over the next few months. Yeah.

Alan Morrison: 

And so, just to paint the picture of a logistics provider, anything…

Tim Cook: 

Anything in supply chain.

Alan Morrison: 

So physical materials, anything hospitals would be using for example, all the different equipment, all the different…

Tim Cook: 

If you’re a manufacturer, you get your components from different suppliers and put them together. And so it’s very complex right now. What they have to do is they have to keep catalogs of the X12 definitions for each one of their component suppliers, and for each one of their customers, so they know how to exchange the data. It’s a very manual process still. You know, they have to go in and manually build mappings for every one of these, which to be fair, you have to build a data model here, too, but you can exchange the data model and not a PDF. Yeah.

Alan Morrison: 

There’s so much to unpack here, Tim, and I appreciate you going through this. I know we’re coming up on the top of the hour and I’m going to say let’s close it down for now, but I’ll probably be back in touch with you to get some more insights. And sure, you know, it just triggers a lot of questions that I have in my mind now.

Tim Cook: 

Love to have your questions. 

Alan Morrison: 

And I’ve got other people that are going to have questions, too.

Alan Morrison: 

Sure. Just use the contact on our website and I’ll be happy to answer any questions.

Alan Morrison: 

So, you’ve got the Axius site and then you’ve got the Semantic Data Charter site. We’ll share the URL to both of those. And thanks so much for unpacking this a little bit today. It’s great to know what you’re working on.

Tim Cook: 

Thanks so much for the opportunity. I appreciate it.

Alan Morrison: 

Okay. We will say goodbye for now. Bye-bye.

Tim Cook: 

Bye now.

One response to “Timothy Cook: Stop Migrating Your Data”

  1. […] Cook of Axius SDC in a recent interview posted in these pages talks about immutable versions of ontologies. Space, time and event-oriented […]

Leave a Reply

Trending

Discover more from The GraphRAG Curator

Subscribe now to keep reading and get access to the full archive.

Continue reading