Mike Dillinger of Hypergraf.ai

Mike Dillinger is a veteran Silicon Valley technologist with a background as a linguist and cognitive scientist, giving him a unique perspective on knowledge graphs (KGs). Mike’s extensive career includes key roles at LinkedIn, Intel Labs, and eBay. At LinkedIn, Mike helped build the company’s core Economic Graph and the Interest Graph, a knowledge graph focused on connecting content topics to user interests. 

During this interview, Mike emphasized that professionals with training in taxonomy and ontology building are critically underutilized and possess direct, relevant skills—especially in classification and conceptual analysis—that are essential to knowledge graph success. 

Mike shared a success story from LinkedIn where one small team built a knowledge graph to identify “green skills” and “sustainable jobs,” harnessing the power of classification to power search, ads, insights, and recommendations.  Their success was attributed to moving beyond simple keyword matching (like “sustainability” in a job title) and using the KG to define and classify what actually constitutes a “green” job, leading to much better retrieval. He suggests that benchmarking knowledge graph platforms should focus on relation-by-relation assessment rather than abstract, generic accuracy metrics.

When it comes to tackling the interoperability challenge, Mike favors “soft standardization” over “hard standardization.” He explained the difference between the two concepts this way:  “Most people think of standards in terms of what I call hard standardization. They think of it as we’ll do things this way, and whatever you had before, you have to set that aside and translate it all into this new way. You can contrast that with what I’m calling soft standardization. In this view, we think: do we have (or can we create) rules or algorithms that will allow us to translate between your version and another version we’re going to use for interoperability?”

Mike Dillinger Interview: Edited Transcript

Alan Morrison: Hi everybody. It’s Alan Morrison with another issue of the GraphRAG Curator podcast, and I’m pleased to have Mike Dillinger with us. 

Mike is somebody I’ve known for a while now. We met in San Jose. We both are local to Silicon Valley and Mike’s been in Silicon Valley as long as I have. He’s worked at a lot of these big companies, LinkedIn, Intel Labs, eBay over the years and he’s got a fantastic educational background as a linguist and a cognitive scientist. So he’s very knowledgeable about knowledge graphs. 

Today we’re going to definitely be talking about the knowledge graph biz as it were and how it’s evolving. But Mike, I just wanted to start with how you got into this field. What was your interest when you started?

Mike Dillinger: Oh, that’s really interesting. I drank the Kool-Aid a long time ago when I was an undergrad and I had a crazy (in the best sense) mentor of a professor who was building the foundations for what today we call knowledge graphs. So I got into it very early as a linguist applying the skills that we learned for analyzing language and in particular semantics. But this professor (David G. Hays, the second president of the Association for Computational Linguistics) was cross-appointed in computer science and library science. He was looking at using knowledge graphs as the foundation for the library of the future.

Tapping the Existing Semantics Workforce


Alan Morrison: I’ve worked with librarians at PwC, for example, and know a number of librarians in the semantic technologies area. I have the impression that they’re underutilized, that their capabilities are really directly relevant to what people are trying to do with knowledge graphs. Is that your impression too?


Mike Dillinger: Yes, they seem to be the only people who actually have training in doing things like classification. And that right there is a good start. When I built my team at LinkedIn, I had to explain to HR people what taxonomists and ontologists were and what they look like. We couldn’t just reach out to the nearest university program and say, “Okay, give me all your graduates from the knowledge graph development program.” We had nothing like that. In many cases I’ve seen people go to the library science department to get someone who’s at least has some training.

Alan Morrison: In general, there are so many different roles and skills. When I was at PwC, we did a project where we were trying to link 6,600 skills of the firm to various opportunities and do that in an information system so you wouldn’t have to manually match skills to project engagements. You have all these different roles in an organization. It seems very clear that in general this whole AI challenge could benefit from better use of people in various parts of the organization.

Mike Dillinger: I agree. That kind of interdisciplinary thinking isn’t really very common. People seem to prefer cubbyholes. So, that’s a key challenge for us. Keep talking about it. More people have to hear this point of view.

Alan Morrison: You’ve worked with data science teams over the years, and you’ve worked on programs that have particular goals. Can you talk a bit about one of those programs and what the objective was and who was involved?

Mike Dillinger: One of my favorites is a little project we ran at LinkedIn, a side project actually. I bumped into someone, a director, on the bus to the train station. She said it would be really great if we could get LinkedIn to surface sustainable jobs, green skills, green jobs, these kinds of things. So I said, “Okay, sure. You want us to look at it? All I need is an intern, a contractor or two.” And once we got into this it turned out to be something that was really useful.  Another green jobs report just came out this week and I think they still use the knowledge graph that we built.

That was developed with the Economic Graph team. For them, we went in and started defining things like what does it mean to be a green skill? and what are the parts of green skills? and what are the greener kinds of skills that people have? For example, there’s this job title here that doesn’t say anything about sustainability, but it looks like it might be a green job title – a hydrographer, for example. Based on these definitions and criteria for deciding on what’s “green”, we pulled together a nice knowledge graph about what sustainability means and what kind of skills are involved. That’s what powers all the green skills and green jobs reports from LinkedIn.

Alan Morrison: That’s cool, and so a success story. What do you attribute the success of that effort to? What made it successful? You were allowed to really do what you thought you should do, for one thing. Did you have the autonomy?
 
Mike Dillinger: Well, they were stuck.

Alan Morrison: What’s that?

Mike Dillinger:  The autonomy came from other people not knowing how the hell to even start doing this. They did a couple of first attempts using job titles that had the word sustainability in them. And they were really not very pleased with the results. And it turns out that, well, guess what? There are a lot more sustainability jobs that don’t have the word sustainability. 

So a big chunk of the success was “we don’t know how to do this” – we need different skills. Fix it for us, okay? Then the surprise was that based on the graph we built they came up with really solid statistics that drew a lot of attention. Even the CEO said hey wow this is interesting.  This was a side gig for us as a little tiny group within my team, but it was relevant, and it was needed. There was a pain point to solve and it worked out well.

Alan Morrison: So in one of your blogs, you were talking about benchmarking and metrics. You mentioned statistics in this example. Can you share what kinds of statistics in particular were relevant and helpful.

Mike Dillinger: Well, for the kinds of things that they were doing for the Economic Graph, it was counts. They simply wanted reliable counts of things and we needed to be able to explain why we were calling them “green” jobs.

Once we could get the counts reliable and say, well, this is why we’re counting this as green, then all the other derivative measures like how this changes over time, how this changes regionally, how these jobs show up in different titles, all of this started to make more sense.

How Good Graphs Improve Usability Metrics

Alan Morrison: It just seems like the power of the classification ability was really key there.

Mike Dillinger: Exactly. It was in many cases.

Alan Morrison: I’ve been looking at benchmarking for knowledge graph platforms and there are assertions of particular levels of accuracy for example. It’s hard to know what that metric means.

Mike Dillinger: I totally agree.

Alan Morrison: Classification is one aspect of this. Are there other aspects that come to mind if you were to benchmark a  knowledge graph platform?

Mike Dillinger: Classification, at least the way I look at it, revolves around a single relationship like parent_of or subcategory_of or something like that.

For knowledge graphs, we have a wide variety of other relationships that we want to track and benchmark and pay attention to. So the way I usually think about it is in terms of relation-by-relation assessment. How well are we doing with this relation? Oh, subcategory_of or parent_of are working fine. What about used_for or produced_by? Oh, they’re not working so well so we have to focus on them next. That’s how we ended up approaching things. If you think of it as a spreadsheet where each column represents a different relation, then we would go column by column.

Alan Morrison: It sounds like there are familiar relations that you work repeatedly, and there’s this whole notion of a shared ecosystem where you’ve got shared ontologies, you’ve got shared relationships and systems. Is that something you favor in your work?

Mike Dillinger: This is definitely a key area of interest—something I’m fascinated by. Our community actually has very many taxonomies, ontologies, and knowledge graphs, but they’re not interoperable. It’s very difficult to leverage someone else’s work. We spent a lot of time trying to do this at LinkedIn and were by and large not very successful. That’s a problem I spent quite a bit of time thinking about, and I think figuring out how to make these resources reliably interoperable has to be one of our next goals.

Hard Versus Soft Standardization

Alan Morrison: It’s a big struggle. I’ve talked with people working on the standard business report model—making spreadsheets dynamic, shareable, and ideally interoperable. The conversations behind the scenes to make that possible are challenging. These are people who’ve been using standard XML approaches for business reporting. They’re accountants who do public reporting for companies. They know the XML version of things, but there’s not a lot of semantics involved.

Mike Dillinger: Part of the problem in my experience—I’ve worked on standards projects as well—is that most people think of standards in terms of what I call “hard standardization” – as I mentioned earlier. We’ll do things this way, and whatever you had before, you have to set that aside and translate it all into this new way. That’s a very hard option to sell! 

You may remember that long ago the people at Stanford came up with something called KIF—Knowledge Interchange Format—and the whole idea was exactly that. Let’s translate whatever you have, or at least map whatever you have into this interchange format so that we can make all of them more interoperable. I think that’s probably a more robust and more plausible way of trying to do this –  which I call “soft standardization”. You keep doing things the way you’re doing them, but we’ll develop an algorithm that will map them to this standard that everybody else is going to use for interchange or interoperability. That’s a very important concept.

The Importance of Relations in a Machine Learning Context

Alan Morrison: There are those who work on this kind of thing specifically, and then there are data science teams and data engineers. Companies have to think about AI, and executives often are very anxious to adopt AI. You’ll see a lot of people hired who have particular skills in neural networks or statistical machine learning. What’s your experience been working with those teams, and how do they react to knowledge representation in general?

Mike Dillinger: Mixed, but generally positive. I spent lots of years learning how to speak geek, which helps. I’m actually quite bilingual at this point. I’ve had to study on my own enough to keep up with the engineers to be able to communicate and say, “Look, this is what your model is doing. We can add to that.” 

At LinkedIn, the tech stack was based on machine learning models. When you look at the equations, though, you see they’re assuming that this feature is independent of – totally unrelated to – that feature and this other feature. There are no relations baked into the equations for standard machine learning models. What do we do on the knowledge team? We inject the relations into the data so the engineers can build on those. They use the same tech stack, do the same things they were doing before but get much better results This is how we built collaboration with engineers at LinkedIn. We explained to them: “Your models aren’t taking these kinds of things into consideration, so we’ll put them into the data and you’ll have much more to work with.”

This, by the way, is one of the things that motivated the move to deep learning, which goes to the opposite extreme and says, “Everything is related to everything and let’s see what shakes out.”

Alan Morrison: That’s true. You’re confronted with so many irrelevant relationships that you have to sift through just to get to the meaning.

Mike Dillinger: Engineers don’t have any training at all about the qualitative side of things. They have really wonderful training in math, but when it comes to anything qualitative like linguistics, for example, they’ve never heard of it.

Qualitative Versus Quantitative Analysis

Alan Morrison: The way you phrased it triggers a thought. You’ve got the qualitative versus the quantitative. As a researcher, I always started with the qualitative—establish what you can establish with the qualitative, and then whatever quantization you can do. Without that qualitative phase….

Mike Dillinger: On the other hand, there’s huge power in the training that computer scientists get because it’s maximally domain independent. You can use it anywhere. They’re doing math that you can use anywhere, techniques that you can use anywhere, data structures that you can use anywhere. They’re forced into collaborating with others to understand the qualitative aspects.

Alan Morrison: In terms of knowledge representation and linguistics, I’ve come across approaches such as Role and Reference Grammar. It’s supposed to be multilingual. Are those people successful in what they try to do?

Mike Dillinger: I haven’t seen van Valin’s stuff in a long time. One thing approaches like his have in common is trying to reach a higher level of abstraction. In that sense, they can be applied to different languages. The Role and Reference stuff involves roles that are very similar to the relationships or relations that we see in knowledge graphs. It’s a kind of a step in that direction. 

The things we use in knowledge graphs come very much from Charles Fillmore’s work on case grammar. That was the foundation in the sixties for the first generation knowledge representation of the seventies. The way I look at it now, I’m very much less a linguist and much more a cognitive scientist. I look at these roles as concepts rather than as aggregations of linguistic relations. There’s a whole continuum of ways of looking at these things.

A Hybrid AI Roadmap

Alan Morrison: Just getting back to the relations and getting started with relations—I think you’ve built a good common understanding here when we’ve been talking about how to get started simply. Let’s say you’re at a company that has gotten started and they want to put together a long-term plan. You’ve got the champion, you’ve got the leadership on board with a hybrid approach to AI. If you had to put together a roadmap today, what would it consist of?

Mike Dillinger: Not coincidentally, we have two kinds of clients. On the one hand, we have clients who are in just that kind of situation. They understand that this is going to be important, but they don’t know how to start. The other kind of client is someone who doesn’t know how to do it but does it anyway and then gets stuck. The work we do is very much a mix of design and training, helping people understand why things are the way they are in the knowledge graph. For example, you don’t want to take this kind of information and put some of it in categories and some of it in relations between categories. 

The roadmap—once we get some of these foundational issues settled and we’re in agreement about why we’re doing this and how we can move forward—many times is just progressing from one relation to the next. We need these relations, the ones most often used. Then, depending on the use case, we’re going to need new ones next. This is the pattern I see across clients very systematically. We have a core set of relations that get us started, and then we see what your use case requires.

Alan Morrison: Here in the Valley, one of the things I’ve encountered that seems relevant is that you’ll be working with a certain group of people, and you’ll be able to establish some things like you talked about and get this common understanding going, but then some of the people leave.

Mike Dillinger: Then someone new comes in and says, “Oh, you’re the cleanup crew.” This is what we used to get all the time at LinkedIn. The first thing I had to say was: we’re not janitors. The second thing I kept saying was we don’t clean up the data, we enrich it. Because if we clean up the data, then you’ll lose the robustness of the algorithms, and that’s deadly. We keep the junk there but we annotate it so that you can treat it as if it was good data. As you said, this has to be an ongoing conversation with lots of patience.

Alan Morrison: It’s hard to think about the fact that you’re working with a company and you get to a certain level of maturity. You’re excited, people are things are firing on all cylinders, and then all of a sudden the champion leaves or something, and you’re starting from scratch in some ways. The artifacts, the things that you’re working with that have the knowledge power in them—if they’re not utilized, they’re not having the benefits they’re designed for.

Mike Dillinger: That’s an ongoing issue everywhere. Somebody has to feel the pain so we can bring the band-aids.

Knowledge Graphs as Essential

Alan Morrison: Are you talking with more people who are feeling the pain? What are you seeing out there in the opportunity space?

Mike Dillinger: I don’t feel there are more opportunities, but I’m seeing better opportunities where people are not just curious—they’re convinced that this is something they have to do. Maybe I have this impression just because these are the people who reach out, but I’m seeing more of them. People understand better why they need to have some sort of knowledge graph.

Alan Morrison: Are they making clear distinctions between somebody with your background, for example, and somebody else who’s got some kind of knowledge graph that they might have cobbled together and it’s not systematic in the way it was developed? Are you able to really make those distinctions clear for people who are considering different approaches?

Mike Dillinger: One group of clients I have is the ones who started it and all of a sudden they say, “This isn’t quite working. We know it’s supposed to work, but it’s not working the way we expected or we don’t see how we’re ever going to scale it up.” Then they reach out and say, “Can you check out our system? Maybe you can tell us what’s going on here.” 

Of course, there are things they didn’t think of yet or they made assumptions that the system would work well this particular way. We know that whose assumptions won’t fly. These are people who have a vested interest in making knowledge graphs work—or ontologies, they also call them. I’m seeing more of that than people who are just kind of like, “What’s this knowledge graph stuff?”

Alan Morrison: Are you still focused primarily on the tech sector? Companies that would be local to our geography, for example? Are you working in other industries?

Mike Dillinger: I have mostly clients in tech-adjacent fields both in our geography and others. For example, one in video generation or entertainment tech and one in HR tech. Those are just two of them, but you can see it starts to get diverse. 

Alan Morrison: Video gets into an entirely different realm in some ways. The methodologies are certainly the same at some level, but what are you encountering with the video?

Mike Dillinger: The video part itself is not what I’m so much involved in, but it’s the front end where you want to control the system to do what you want it to do. What we usually do with knowledge graphs is we use them to be able to communicate with the system more clearly. When users communicate in a truncated or incomplete way, we count on the knowledge graph to fill in the blanks so that whatever system it is—search, video generation, ads—it can fill in the blanks. Our graphs fill in the blanks so the algorithms can work better.

How Agentic AI Fits In

Alan Morrison: We’ve got the communications network that has all these multiple uses. You think about software developers and paradigms shifting with agents. I’ve said before that it seems like most of these AI startups are putting the cart before the horse with agents—that the horse really needs to be in front and you have to have the power to pull things forward. Is that your impression as well? How do you have a discussion about agents with those who are really interested in agents?

Mike Dillinger: With agents, we actually have twice as many issues. One issue is how do we communicate with the agent so the agent actually understands what we need it to do? And then, even worse, how is the agent going to communicate with another agent that might have been developed elsewhere?

Alan Morrison: You have the interoperability problem sort of magnified.

Mike Dillinger: Magnified exponentially, actually. If you’re generating armies or swarms of agents, then this is Babel all over again. The robotics people know about this. They have some communication protocols, but the protocols are still very low level. Now the question is how can we make those communication protocols semantically richer and more reliable? Using knowledge graphs, of course.

Hypergraphs and Semantic Richness

Alan Morrison: Your company is called hypergraf.ai. I was thinking in terms of robotics—I came across what Hanson Robotics had funded years ago, which was a hypergraph system. Just for purposes of definitions and your interests, I always think about hypergraphs as relations on relations. Maybe you could have a more suitable definition and describe what you think about in terms of hypergraphs and relations.

Mike Dillinger: Technically, hypergraphs are defined as relations that have more than two arguments—more than two nodes. One edge with more than two nodes – a collection of nodes. I’m fascinated by them because this is what we’re going to need as we move towards describing processes and actions. These sorts of things are very difficult to describe with these little triple thingies that we use today. We’ll need hypergraphs. My interest is creating the foundations to make this possible and easy as we move ahead.

Alan Morrison: Are you seeing a path forward where this is feasible and ready for prime time?

Mike Dillinger: Definitely. This was actually a big focus when the work on knowledge graphs started under good old first generation AI. They worked out a lot of details about how we would define and deploy hypergraphs. A lot of my work for years was: how can we transform free text, technical text, into knowledge graph format and move on from there? By definition, if you’re talking about positron emission tomography, then you need to talk about events and processes and things like that. There’s already a foundation of well thought-out conceptual tools that I think we can leverage to make more progress. 

For the longest time, people have been focusing only on objects—objects in a taxonomy, objects in an ontology. And when you look at the relations they use, they’re just a plain list. What about the relations between those relations – like you mentioned a minute ago? We need those. What about more complex relations that have more nodes? How are you going to talk about a sales event with a triple when you have someone buying from someone else a product for a price? We can’t do that with triples in a way that makes a lot of sense at this point.

Alan Morrison: It seems like the whole dynamic of event-driven systems—you mentioned events—seems like that sort of hypergraph approach lends itself to an event-oriented, event-capable description.

Mike Dillinger: Yes, it’s essential for representing things like digital twins. It’s not a key area that many people are talking about a lot yet. Part of our community is kind of focusing on getting others used to simple triples, but I think it’s more than feasible to move on from there to tuples, you might call them. If we move on to tuples, then what do they have to look like? That’s something I’ve been working on.

Alan Morrison: As part of client work or just on your own?

Mike Dillinger: Both in fact. On my own, getting ready so that I’m prepared to meet the needs of clients who have to do this. I can see this happening very soon. I’m already seeing with clients that if you’re generating video, you’re going to want to describe actions and events, and triples won’t do that in a way that’s going to be scalable.

Formal Logic and Its Limitations

Alan Morrison: How does reasoning factor into all of this? You’ve got the method down for keeping this simple, it sounds like to me. You’ve got an approach for people who are just getting started—here’s the classification approach that you take, which all makes total sense. Then you have the formalists in the semantics community who are very passionate about reasoning ability, inferencing, but that gets very complicated in some ways. You want that reasoning ability and you want to be able to implement it when there’s a real need for it.

Mike Dillinger: The people in the formal community focus a lot, in my understanding, on hierarchical relations like subcategory—the ones that are amenable to formal logic. Then they start having problems when we branch out into other kinds of relations. I think we have to step back and think about reasoning more broadly—not as broadly as the LLM community thinks of it, which is just basically anything that anything counts as reasoning. But if we start from a knowledge graph, then we can identify and classify the different kinds of paths we can take within the knowledge graph. That’s a key kind of reasoning. 

The kind of information that you want to gather along the way as you traverse a path—that’s a kind of reasoning. How we can make hops or go around gaps in the knowledge graph—that’s a kind of inference. I think all that’s a much richer enterprise than what formal logic would have us do with deduction and induction.

Alan Morrison: We’re sort of going down a rabbit hole with the formal logic—we’re sort of missing the bigger picture here?

Mike Dillinger: Golden handcuffs. That’s my take. Like with any tool and any system, you buy a tool, whatever tool you want, but you’re also getting for free, as it were, all the assumptions that they made when they built it. Then you have a problem because sometimes those assumptions don’t match your use case. We have that a lot with formal taxonomy work. For example, there’s only one kind of relationship. You can only have one parent node, and there are a bunch of constraints like the MECE [Mutually Exclusive, Collectively Exhaustive] assumptions that really tie our hands in many use cases. I’m skeptical about their long-term usefulness. I can see the uses for a lot of these formal systems, but it’s really clear that we have to make them richer to meet our needs – even now.

Alan Morrison: It goes to the notion of what constitutes a viable architecture. You’ve got all these constraints you have to deal with given your methodology. You’ve talked about this in other ways before. You’ve blogged about: let’s just dispense with Boolean logic, for example, because it’s overused and we just trip over it all the time.

Mike Dillinger: Golden handcuffs. It’s comfortable. It’s traditional. We know about it. We can do things with it. But there are a whole lot of things that we can’t do with it.

Alan Morrison: How do we get to that broader mentality that you’re pointing to? A lot of people have been to school, they’ve studied philosophy, they’ve had courses in logic. You see a lot of these people in knowledge representation today. How do we free them from the golden handcuffs?

Mike Dillinger: That’s really difficult. I work hard at talking about it and trying to show things that we could do if we gave up on this assumption or that assumption. I’ve done a bunch of blog posts about that. Other than that, we have to try to work out systems to show, “Hey, look, now it’s working and this is why it’s working—because we dispensed with this assumption that other people have been making.” That’s how fuzzy logic became very well known. They just changed a simple assumption: instead of binary truth values, we would have continuous truth values. This is actually what underlies most of machine learning—fuzzy logic assumptions.

Machine Learning and Its Own Limitations

Alan Morrison: To your mind, the machine learning approach has a lot of flexibility, but sometimes you get into trouble because perhaps there is too much flexibility?

Mike Dillinger: I don’t usually see that. I usually see too many constraints. This standard equation that a machine learning system will look for is one where all the factors are independent of each other. There are no interactions. That’s fine. It works really well in many contexts. But it’s also the reason why we need such huge amounts of data—because the equation is too simple.

Alan Morrison: It seems if you build the context first, then you’re just dealing with one context at a time perhaps. That seems inherently more efficient. If you’ve described the subgraph or the piece of the mirror world sufficiently well, then it would seem you don’t necessarily have to do this brute force processing all the time.

Mike Dillinger: Totally agree. This is the principle that with structure we reduce entropy. If we have a lot of entropy or error variance, then we have to have much more data to make up for it and find reliable conclusions or patterns. That’s a well-known trade-off.

Alan Morrison: How does that relate to what’s going on in data warehousing, for example, or transactional environments where they’re working with tabular data and the relationships are foreign keys or column headers? What’s your perspective on that kind of structure and how it relates and how to use it?

Mike Dillinger: I think it’s the same situation. The column headers are assumed—in the bowels of the database system, in the original plans for them—to be independent of each other. Can you store in a database that column A and column C are subtypes of something else? You can’t. It just wasn’t conceived that way. The foreign key came up as a way of starting to do that. But we don’t have typed foreign keys, do we? We just have a bridge and say, “This row is related to this row.” We have no type system for that. This is what we get with knowledge graphs—they’re typing the foreign keys, as it were.

A Nuanced Take on Labeled Property Graphs (LPGs)

Alan Morrison: It seems like the labeled property graphs have that same dilemma as well with undescribed relationships.

Mike Dillinger: My take on labeled property graphs is that we can kind of separate out the property graph as a language for describing facts from the algorithm that uses it. I was puzzled for a long time about this controversy over LPGs and RDF. When I started to look into it more carefully, I said, “Wait a minute, they’re mixing up characteristics of the metalanguage we have for describing facts with what the algorithm does.” 

If we think of those separately, then we see that LPGs are actually pretty good as knowledge representation languages. They have some interesting advantages. They’re more flexible for things like hypergraphs and relations between relations. Then we have to look at the algorithm and what is the algorithm doing with that stuff we represented. The algorithms associated with LPGs don’t require that you have an ontology in there to define structures. They permit it, but they don’t require it. I think there’s more subtlety in the controversy than I read about.

Alan Morrison: At the same time, it just seems like you’ve got all these different methods that have some level of power, and people are investing all this time in using them. You want to be able to use them, but then you try to bring them together and that seems like a big challenge.

Mike Dillinger: That’s a perfect summary. This is why I’m so interested in interoperability. We have all of this effort, and it’s not going to be useful until we can gather our forces together, or at least our results together, in a way that makes sense.

Findable, Accessible, Interoperable and Reusable (FAIR) and Pharma

Alan Morrison: I’m guessing you have a perspective on FAIR principles and what the pharmaceutical industry has been trying to do with FAIR and FAIR squared. What is your perspective on that? Do you think the pharma industry is headed in a good direction? Is it helpful? Should other industries learn from it?

Mike Dillinger: I think FAIR is really an excellent way of thinking about things. We want the things we produce to be useful to other people. That summarizes it really rather quickly. I’m not sure about FAIR squared yet, but FAIR, from everything I’ve seen, is a goal we can aspire to and we should aspire to. 

The one story I like to tell about the pharma industry was a hallway conversation I had with someone from one of the really big pharma companies. I’m making conversation: “How’s your knowledge graph? How are you using it? What do you use it most for?” This director kind of looks at me and says, “I’ll tell you the truth, it’s not very useful.” I poked and prodded and squeezed mercilessly until I could wrap my head around what was going on. 

It turns out what happened was they overinvested in these vague, uninformative, bidirectional relations like “associated with” or “related to” or “similar to.” They have a million of those, and every time they type in a query they get everything back. Definitely not useful. That was an interesting conversation.

I know the CEO of a company who serves big pharma with this kind of data. When we got to talking, he said, “There were really significant performance issues. The knowledge graphs are too slow.” 

I said, “It’s probably because there’s all this stuff in the knowledge graph that’s not related to how a researcher thinks about proteins or genes or whatever.” That’s my best guess at the moment. If we filter it, then we’ll be able to improve performance enormously. Big pharma has invested so much—they have really substantial knowledge graphs—but if they’re storing everything together, the good, the bad, and the ugly, then it’s going to be hard to use them effectively.

That’s led me to start thinking about how we could profile a knowledge graph and see which parts of it—we can call it an obese knowledge graph that has all this extra stuff. This is something I’m seeing happen in the biomed domain. They have lots of other things in the knowledge graph, and they need to say, “What are the parts of this protein and what are the reactions it participates in?” But you have to wade through the author of the article, last edit date, all these other things. Synonyms—thousands of synonyms. “Oh, this corresponds to this in this other group in this other database.” 

I’m thinking we need a decent method to profile a knowledge graph and say, “This is administrative information. This is a synonym mapping information to other systems. And here’s the core, the meat of the system. And how much is there?” Knowledge graphs look like they’re too slow because companies mix in too much of too many kinds of information.

Alan Morrison: Different levels of abstraction that make it possible so that you’re not overwhelmed with all sorts of things that don’t seem to relate terribly well from your perspective.

Mike Dillinger: I’m thinking of it as a way of tracking progress better because somebody can say, “Oh, we have a million triples in here.” You look and there’s only a thousand of them that are really about proteins. The rest are about other things: authors, publications, IDs in other systems, edit dates, etc.

Interest Graphs

Alan Morrison: One of the things that came to mind was that we’re working on the corpus of information that enterprises have available to them and trying to make it more reusable. But then there’s the whole panoply of individual users, individual humans, and how they differ and what their needs are. LinkedIn is probably a decent example of articulating what the interest graph is, so-called. It seems like the interest graph is just as important as the content and the structured data knowledge graph.

Mike Dillinger: It’s just an additional knowledge graph. At LinkedIn, they have something called the interest graph. It’s a subpart of your knowledge graph that focuses on topics rather than titles, skills, industries, etc. Here’s a piece of content—which topics are represented in the content? Then you can accumulate them for each user and say, “Ninety-nine percent of the time this person likes this and this and this.”

Alan Morrison: The system that we envision and that’s materializing now—a hybrid approach, a hybrid architecture—how would you envision it? What should be the proper hybrid architecture that brings together the capabilities of machine learning with knowledge representation and allows for, example, database retrieval after an LLM query? What components should be in mind, and what’s the approach that you would take to building such an architecture or designing one?

Mike Dillinger: The key distinction here is between the LLMs and the rest. The way I think things are moving is to use the LLM for what it does absolutely by far the best, and that’s as an interface with humans for natural language. I call LLMs the ultimate user interface for humans. And not use it as a reasoning engine. That’s the way I think we’re going. The reasoning capabilities are very controversial, up in the air, and not very reliable. 

Once we have that distinction, we have a really good place to start. We have the LLMs do a really great job of accepting and producing natural language. They can also translate into any query language that we want. Then we need some sort of federated system where we have databases and knowledge graphs and ontologies and different things. We’ll probably have to build a map of them as an additional higher-level layer for routing queries. But for the moment, it seems like something cobbled together like that is what’s going to be best.

Alan Morrison: Could this just be a small language model rather than a large language model? And what about all the training that’s been done on public web sources, scraping and such? Is that relevant to what enterprises need these for?

Mike Dillinger: Small language models have the advantage that they’re small and they’re cheap, but they’re not as robust as interfaces. When large language models are available, that means they’re just that many times more robust as interfaces. If they’re affordable enough, then prefer a large one for more robustness. We can always focus them on a particular domain.

Alan Morrison: I go to these infrastructure conferences, and the talk is about how much energy consumption there is in these AI data centers. In California, for example, they’re taxing the grid. PG&E can’t deliver enough capacity for the big data centers to do what they need to do. There’s been discussion about small nuclear reactors and putting in substations. Three Mile Island. There’s obviously the situation with LLMs as they’re currently constituted—they’re energy hogs essentially. Do you have a perspective on how knowledge graphs fit into the resolution of that problem?

Mike Dillinger: There is a sizable literature that shows that knowledge graphs can help with training and fine-tuning and every single step of the way. I really like this review that I saw once where every step we take in developing an LLM—if you add a knowledge graph to the mix—that makes it work better. But I don’t think they’ve been using knowledge graphs in the appropriate ways. What I’ve seen is researchers serialize them and make believe that it’s the same as text. I don’t think that’s the way to do it. There’s all this literature from graph neural networks and a whole range of other things that tell us we can do better than that – we can do more with less. The interesting thing I saw was even with these simplistic uses of knowledge graphs, there was a significant impact for everything—for initial training, for fine-tuning, for evaluation. We know there are better ways of doing it. If the training materials for a reasoning engine, for example, were just knowledge graphs (not mountains of written texts), for example, then the energy consumption would be ten thousand times smaller.

Alan Morrison: We’ve got more feedback loops, feedback response loops than we’ve had before. Knowledge graphs have thousands of different use cases. They may be used in dozens of different ways today, but what you’ve pointed out is at every step in the process there could be a benefit to it.

Mike Dillinger: That’s what the literature is showing. I have a post on that. I’ll dig it out for you. (See “On to Knowledge-infused Language Models” for the post Mike is referring to.)

Outlook for 2026

Alan Morrison: This has been a really interesting and useful conversation, Mike. Looking forward to 2026, we’re coming up toward the end of 2025. Do you have thoughts about what you’re anticipating, what things are going to be new and different that you’re looking at?

Mike Dillinger: I think the trend towards small language models will probably increase because it’s just too big for most normal groups to be able to work on a huge foundational model. That’s clearly going to increase. I think we have to do much more work with semantics. Knowledge graphs I expect will continue to grow. I’m excited about doing things like: how can we validate knowledge graphs automatically? How can we automate the building process? How can we check the coherence of a knowledge graph or, starting from a taxonomy, generate a knowledge graph from it? The way I look at it, with the tools we have now from LLMs, we can do thousands of things that we couldn’t do before. 

Alan Morrison: All sorts of available capability, but you have to be able to bridge the gap between LLM and knowledge graph and other technologies that are in play here. A generalist would seem to be somebody who understands the bigger picture and both sides of that, or all sides of it. That would be very valuable if you could just think strategically about this stuff.

Mike Dillinger: At the moment we need generalists who can get into the weeds and teach people how to actually do it right. I make a distinction you might find useful between computational tools—we have lots of those and they’re really wonderful—and conceptual tools. One of the things I like to do as a generalist is work on the conceptual tools. Let’s see if we can find a shared understanding of what a knowledge graph is and isn’t. What is inference and what is it not? What makes a good knowledge graph good or not? I’m interested in those. I consider definitions tools that we manipulate to help guide us when we use computational tools.

Alan Morrison: Clearer distinctions along those lines would be really welcome. It would be good to have something distilled that you could just share with an executive and finally shed some light on that. Stuff that hopefully we’ll work on soon. Mike, where can people find you? I know you’re on LinkedIn. I know you’re on Medium.

Mike Dillinger: I’m definitely on LinkedIn, not as often on Medium. On LinkedIn I have a newsletter called Knowledge Architecture that you can get at through my profile. That’s probably the easiest way to find me.

Alan Morrison: It’s been a pleasure to sort of unpack all these issues, and I appreciate the patience and the articulation.

Mike Dillinger: Always a pleasure with you, Alan. We should do this more often.

End of Transcription

2 responses to “Mike Dillinger: Knowledge Graph Relations Are Key to Interoperability”

  1. Really strong conversation.

    A few points here resonate deeply with work I’ve been doing for decades—especially the emphasis on relations as first-class assets, not just entities.
    The green-jobs example is a textbook case of why classification precedes counting. Once you explicitly define what something is (and why), reliable metrics and downstream analytics follow naturally. Keyword matching fails precisely because it skips that conceptual step.
    I also strongly agree with the idea of relation-by-relation benchmarking. Treating a knowledge graph as “accurate or not” misses the point. Different relation types mature at different rates and need different governance. Thinking of relations as columns in a spreadsheet is a very practical mental model.
    On interoperability, the distinction between hard vs. soft standardization is crucial. In my experience, forcing replacement rarely works socially or technically. Translation across viewpoints does—if you manage provenance, versioning, and variance explicitly. That’s where controlled vocabularies, thesauri, and mapping layers quietly do most of the real work.
    The discussion on hypergraphs and events also hits home. As soon as you model processes, actions, or lifecycle states, triples start to strain. You need n-ary structures, roles, and states—not as an academic exercise, but to reflect how the world actually operates.
    One thing I especially appreciated was the distinction between computational tools and conceptual tools. We have plenty of the former. What’s often missing is an explicit model for how knowledge evolves, stabilizes, and is reused across contexts. Without that, even very sophisticated graphs tend to become brittle or obese over time.
    Overall, this was a thoughtful, grounded articulation of where knowledge graphs actually deliver value—and where they still need better conceptual foundations. Well worth the read and listen.

    1. Appreciate your feedback and thoughts, Roy.

Leave a Reply

Trending

Discover more from The GraphRAG Curator

Subscribe now to keep reading and get access to the full archive.

Continue reading