Jack Kausch: Semantic RAG - The GraphRAG Curator

May 25, 2026

Jack Kausch: Semantic RAG

Jack Kausch, a recent PhD graduate from Western University’s Faculty of Information and Media Studies, has developed an innovative pictographic metadata system inspired by 17th-century language schemes. He emphasizes the underutilization of qualitative approaches in enterprises, advocating for knowledge graphs to unify data management practices. Kausch’s core contribution involves mapping formal ontologies to vector spaces,…

GraphAI, Natural Language Querying

29–44 minutes

academic research, artificial-intelligence, graph-RAG, ontology, semantic modeling, Semantic RAG

Jack Kausch

I recently interviewed Jack Kausch, a recent PhD graduate from Western University’s Faculty of Information and Media Studies, whose research brought together library science, linguistics, philosophy, and computer science to build a pictographic metadata system inspired by 17th-century language schemes from Leibniz, Dalgarno, and Wilkins.

Jack made a strong case that enterprises dramatically underuse qualitative approaches to data. Most organizations silo their data, content, and knowledge management functions, when knowledge graphs could unify all three. He convinced me that user experience design, not just technology, stands as a major obstacle to wider adoption.

What struck me most was his core technical contribution: mapping formal ontologies onto vector spaces using prototype theory from cognitive linguistics. Rather than relying on purely statistical dimensionality reduction methods like PCA or t-SNE, he selects exemplar vectors to anchor the feature space, enabling fuzzy semantic operations like intersections and unions across ontology classes.

Jack also connected this work directly to RAG systems, arguing that knowledge graphs and named graphs provide exactly the kind of provenance tracking and common-sense grounding that large language models lack. He demonstrated this in a history of mathematics project, where his team logged which documents and models generated which RDF triples.

He then showed me a public-facing web tool he built at realcharacterlanguage.world that lets users combine pictographic glyphs and explore semantically similar Wikipedia articles across multiple languages and ontological lenses.

After our conversation wrapped up, I kept thinking about how rare it is to encounter someone who can move so fluidly between the humanistic and the technical, between Ramon Llull’s 13th-century paper computation wheels and the vector embedding methods shaping enterprise AI today.

Jack represents exactly the kind of interdisciplinary thinker that the knowledge engineering field needs more of, and I hope this conversation helps connect him with the people and organizations that can put his ideas to work.

You can find Jack on LinkedIn at linkedin.com/in/john-kausch-6777a8177 and his pictographic metadata system and ontology at https://realcharacterlanguage.world.

An edited transcript of our full conversation follows below.

YouTube Interview Video

Edited Transcript

Interview: Jack Kausch

00:00:00

Alan Morrison: We are online with Jack Kausch. Jack, thanks so much for joining today.

Jack Kausch: Thanks for having me here, Alan.

Alan Morrison: It is great to have you. For background on this episode of The Curator Podcast, we are talking with Jack, who has just completed his PhD at Western University in Ontario.

00:03:01

Alan Morrison: Jack, can you tell us a bit about your program and why you committed yourself to a PhD?

Jack Kausch: Absolutely. My program was in library and information science at the Faculty of Information and Media Studies. I was also a fellow at the Rotman Institute of Philosophy for Engaging Science during my PhD, so the faculty was relatively interdisciplinary. I set out to create a pictographic metadata system similar to the 17th-century language schemes of George Dalgarno, John Wilkins, and Gottfried Wilhelm Leibniz.

I arrived at this insight organically. My undergraduate education was in linguistics, focusing on linguistic semantics. I loved formalizing the semantics of natural languages, but I grew frustrated by logic’s inability to capture the connotation of language. This frustration led me to imagine creating a notation system that would better reflect the meaning of language than today’s formal systems, rather than developing a new semantic theory.

00:04:25

Jack Kausch: When searching for a discipline to conduct this research, I initially considered philosophy and computer science because I identified the field of applied ontology as roughly what I was trying to create. I ultimately chose library and information science because it is a more qualitative social science field, focusing more on the content of expressions. It has a several-century history of creating classification systems and thesauri with a focus on notation, which aligned with my goals. This research emerged from questions I asked in the context of linguistics, and that spirit has continued throughout my work.

Alan Morrison: It is great to have you because of the interdisciplinary aspect of your studies. I have a lot of background working with enterprises. The implication of the phenomenon with probabilistic machine learning and neural networks coming to the fore is that data science and engineering teams often have a formal background only in that discipline.

00:05:50

Alan Morrison: It seems you can bring many things to bear on the problem set that others might not have had much exposure to, which is entirely relevant. Why can’t we apply this diverse approach to the AI problems at hand? You mentioned library science, but there is also a historical research aspect, logic, and philosophy.

This wonderful combination of humanistic and social science aspects does not get enough play inside enterprises currently. We should have more of this inside universities so people can get trained up the way you are.

Jack Kausch: That is interesting. I wonder if it is going to become more relevant in the future. I have been interested to watch how traditional linguistics and NLP approaches have been sidelined as transformers and large language models have exploded into the foreground.

00:07:06

Jack Kausch: I think there is a lot to be said for taking a more interdisciplinary approach to respecting qualitative data, especially as we do more conceptual modeling. In my view, many sticking points in ontological commitments that happen when working between groups are qualitative problems. Having that kind of training can be very useful when you bring it into the room.

Alan Morrison: Absolutely. As an analyst in former lives, I always started with a qualitative approach. Quantitative data has to be restricted because you do not have the data for everything. With qualitative data, however, you can make many assessments and get to the kinds of problems and questions you want to ask. A lot of it is about asking the right question, and much of that seems lost in today’s commercial world.

00:08:16

Jack Kausch: Right. You can also ask what value might be lost if you do not frame your questions correctly or if you instantly jump to a well-established, off-the-shelf metric. I completely agree.

Alan Morrison: So you mentioned in an earlier discussion that you had been reading Steven Pinker and his linguistic approach to semantics. There is a contrast between the Pinker approach and W3C semantics—the semantic web stack and the worldwide web approach to semantics—which is very much description logic oriented. Can you compare and contrast the two and talk about what you are getting from each of these different approaches to semantics?

Jack Kausch: First, I did not set out to read Steven Pinker on my own; it was part of the Dataworthy book club. I definitely have to credit Patrick Hayes for the insight when he mentioned that the “Naive Physics Manifesto” was approaching the same thing from a different direction.

00:09:32

Jack Kausch: That is what jolted this insight for me: when you do linguistic semantics, you are concerned with meaning and formalizing it with logic, using different types of logic for different formalizations. Common sense knowledge modeling, as I understand it, grew in the 1970s from attempts to create common sense knowledge bases for AI agents; that is the ultimate purpose of that activity.

Linguistic semantics, in contrast, models the meaning of utterances and expressions. You still often use a logic like first-order logic or first-order predicate calculus. Although you may not use a description logic in that case, the process is essentially the same: you have a meaning, and you are trying to formalize it just like you are with common sense knowledge. I think it is interesting that you wind up performing similar activities.

00:10:39

Jack Kausch: There is a lot of crossover between linguistic semantics and information ontology, but that connection is maybe underexploited when we discuss Retrieval-Augmented Generation (RAG). This is particularly true when looking at the vast profusion of unstructured data and trying to find regularities in it, whether with an ontology or a pre-existing knowledge structure.

There is probably much more that could be done in enterprises in that sort of crossover. The difference is fundamentally about meaning: when I describe the meaning of a sentence, I start from a different standpoint of what I am trying to describe, but I wind up going to a very similar place.

[Jack implies that these different disciplines are siloed. For example, practitioners building knowledge graphs and RDF pipelines often don’t draw on linguistic semantics when deciding how to extract and structure meaning from text. Similarly, NLP practitioners building RAG systems often don’t bring in ontological rigor when deciding what to ground retrieved content against. Organizational change–more boundary-crossing collaboration– has to accompany the technological changes.]

Alan Morrison: Makes sense. In terms of unstructured data, it seems astounding that so much of an enterprise’s unstructured data is underutilized. Even in the sciences, a very low percentage of scientific data sets are truly reusable.

00:11:53

Alan Morrison: One objective of the Pistoia Alliance and others working with FAIR data (Findable, Accessible, Interoperable, and Reusable) is to harness the power of both structured and unstructured data together. Success stories in enterprise semantics involve bringing those two together.

However, there is so much focus on the structured side of things, as if that will tell the whole story. It is like looking for your keys under the lamppost because that is where the light is; we want the light everywhere, and the unstructured data seems to have the least light. Is that your impression?

Jack Kausch: Right. I think it depends on your application area. Some large language models, for example, draw entirely on unstructured data for training; the training data sets are opaque and do not have nearly as much structure as an ontology might have.

00:13:00

Jack Kausch: It probably depends on your use case. It is interesting how difficult it is to bring those two things together, which is likely why so much value is accumulating there. The matching can go wrong in so many different joins and places. Any organization that can figure out a good workflow for this is going to be very competitive.

Alan Morrison: The rare enterprise is able to do that. You have focused on explainability in your work, crossing the boundary between probabilistic and deterministic methods. You mentioned explainability in vector databases, for example. Can you talk about what you have looked at and discovered there?

Jack Kausch: Well, from the beginning, I was interested in whether I could create a control vocabulary or an ontology that would serve as the semantic axes of a vector space.

00:14:06

Jack Kausch: I thought that would be very valuable. Initially, I wanted to find a purely formal method of getting there; that is intuitively appealing. Many methods exist for reducing the dimensionality of a vector space, such as PCA (Principal Component Analysis), t-SNE, and forms of Singular Value Decomposition.

I also discovered older ones from the Information Retrieval community in the 1980s, like Latent Semantic Analysis (LSA), which showed that you can maintain the same properties of a document retrieval vector space, and operations performed in one representation space can be done in another. I was looking at all of these things and knew I wanted a representation space derived from an original vector space, but where an existing ontology could form the dimensions of my feature space.

[The methods Jack describes fall into three families:

PCA (Principal Component Analysis) and SVD (Singular Value Decomposition) are pure mathematics — they find structure that is already latent in the numbers, with no semantic input whatsoever.

LSA (Latent Semantic Analysis) and ESA (Explicit Semantic Analysis) are a bridge: they introduce linguistic or encyclopedic knowledge to give the dimensions more human-meaningful coherence, but the semantic coherence in LSA is still accidental (a byproduct of language statistics), while ESA makes it explicit by using Wikipedia.

t-SNE (t-distributed Stochastic Neighbor Embedding) stands apart as a visualization tool only — it is not a representation space you can operate in, just a way to look at one.

The ontology-guided approach Jack alludes to is the only one where the feature space is specified before you look at the data, which is what makes it so powerful for interoperability. Every other method asks “what structure does this data have?” — the ontology-guided approach asks “does this data fit a structure we already agreed on?”]

00:15:14

Jack Kausch: I ultimately settled on a cosine similarity matrix, though it does not necessarily have to be one. Cosine similarity is not always the silver bullet people think it is. The method I chose was based on the intuition that we do not really know what categories are unless we see an exemplar of that category, which comes from Eleanor Rosch and her prototype theory, which influenced Lakoff and cognitive linguistics.

[Rosch (1970s) argued that human categories don’t have sharp boundaries defined by necessary and sufficient conditions — instead, categories are organized around best examples (prototypes). A robin is a more prototypical bird than a penguin, even though both are technically birds. Membership is graded, not binary. Prototype theory directly challenged the classical view (inherited from Aristotle) that categories are defined by fixed logical criteria.

With cognitive linguistics and the research of Lakoff, et al., meaning is grounded in embodied experience, conceptual structure shapes linguistic structure, and categories are prototype-based rather than logically defined. Metaphor is not decorative but fundamental — we understand abstract domains (argument, time, number) by mapping them onto concrete bodily experience. Language is inseparable from thought.]

When I implemented that theory, the idea was to select a set of vectors in your vector space that would be good exemplars of that class. You could even use vector addition to create a composite vector that serves as your exemplar vector for the class. Many different methods can be used to get those exemplar vectors. You then create a secondary space by looking at the similarity from those exemplar vectors.

00:16:19

Jack Kausch: You can do that to any degree distance you want. Ideally, rather than just clustering around your exemplars as centroids, you ask what relationship the exemplar has to every entity in your data set. That may not always be computationally possible, so you might have some threshold cutoff. The theoretical justification was that you select exemplars and then look at their semantic similarity by some metric chosen in the original vector space. I found that this creates very interesting semantic feature spaces, allowing for operations like fuzzy intersections and unions. You can perform semantic operations on the dimensions of your vector space that produce meaningful results. That brought me a great deal of fulfillment, and I am still looking to see how far this can be taken.

[As I understand it, the spaces Jack created lets you do something the purely statistical spaces cannot: meaningful semantic operations. Fuzzy intersections and unions become possible — you can ask “what is sort of like both category A and category B?” and get a sensible answer, because the dimensions now represent human-interpretable concepts (the exemplars) rather than abstract mathematical axes.]

00:17:24

Alan Morrison: I think there are many other questions about how to bring those two things together. This seems totally relevant to enterprise work on describing and classifying domains and the methodology people use for classification. It seems there is much more of a dialogue between the probabilistic side of things and the deterministic side, like traditional classification systems such as Dublin Core. Both should be in play. When you do this kind of research, is there a dialogue back and forth between the two in your mind?

Jack Kausch: Yes, absolutely. Even on the theoretical level, I was asking questions about prototype theory informing the idea of what meaning is and how it is formed. When I was looking at how to map more formal systems to more probabilistic ones, I was really searching for a way to create a formal system that could embrace the messiness of language because we need that messiness.

00:18:54

Jack Kausch: Probabilistic systems are excellent at mapping that messiness for us; it is amazing what they can do. However, my intuition has been that we want to ground that inside of our formal systems. If we can incorporate that messiness into our formal systems, it can give us a knowledge base that has both increased context and is grounded within known facts. I think the ideal is a knowledge graph that can contain some of these probabilistic statements as well. I believe a lot of people are doing that.

Alan Morrison: I would imagine so, and it would be good to talk to more of them. I wonder what this suggests in terms of how we could transform data management. When I think about data management, I am not just talking about tabular data; I am talking about content and knowledge. In big enterprises, you might have a Data Management department, a Content Management department for external content, and a Knowledge Management department for internal content, each using different methodologies.

00:20:08

Alan Morrison: The knowledge graph approach seems to allow you to bring all of those departments together and use the same methodology for all of them. This could help them manage the messy and formal sides in a much more efficient way, as having both in play is important. Can you help us think about what the transformed departments might look like as a unified Data Management writ large with the knowledge graph approach?

Jack Kausch: Well, many stories I hear about implementing these systems involve communication issues and getting people to understand the utility of very abstract systems. Knowledge graphs, per semantic web standards, are incredibly abstract—more so than labeled property graphs, which are already difficult to navigate. One reason I am interested in Wikidata is that, as a user interface and service community, it is very easy for people to navigate.

00:21:30

Jack Kausch: I recently experienced this through teaching catalogers. Even experienced catalogers often do not have this knowledge because these are new forms of linking authority records and data. Wikidata is becoming more important, and it is universally much easier for catalogers to start using than having to explain the whole semantic web stack and all its issues. A lot of this also comes down to a design element: how you communicate and make these tools useful for people, in addition to the management element of having organizational structures that facilitate dialogue. I think user interface and user experience design may be causing more obstacles than we might expect.

00:22:35

Jack Kausch: That would be my first intuition.

Alan Morrison: To my mind, this goes back to establishing provenance using a library science method, maintaining versioning, and preserving the original context while adding context and reusing as we go forward. An organic approach is implicit here that enterprises could take advantage of. Right now, very little of that provenance exists, and there is much guesswork involved just to accurately classify what they are trying to share. If they had a good process upstream from the beginning that established provenance, there would be much less of a problem with that.

Jack Kausch: Absolutely. That is even more ideal because of generative AI’s ability to disrupt chains of reference and provenance.

00:23:58

Jack Kausch: Knowledge graphs will only become more important for maintaining provenance. We can use reasoning and other transformative technologies to manage provenance in ways we likely cannot with paper. I agree we have not yet seen them implemented widely enough, though it seems to be just starting now, which is interesting.

Alan Morrison: In some ways, we have gotten the cart before the horse because we have the generative methods before establishing what this transformed process of data lifecycle management should be. This gets to how we are building knowledge graphs with your approaches, how we are making ourselves more efficient at it, and how we are proving its value. Have you been involved with this at all?

00:25:10

Jack Kausch: Yes. I have been working with a few research capacities to create different knowledge bases that push the boundaries of what is possible. For the History of Symbolic Algebra project, where we looked at an oral history of mathematicians, we used named graphs to ground the output of large language models. If a large language model was generating RDF, we would use a named graph to establish what document it was parsing. This was a way to mediate that.

Technically, this is Retrieval-Augmented Generation (RAG) when you give a large language model a document and ask it to produce a graph for us. This logs the chain so we can see the document it comes from, the model parsing it, and the time stamp. I think architectures already exist that are just waiting to be used.

00:26:10

Jack Kausch: Named graphs and quads, created more than 20 years ago in the semantic web community, are great architectures for modeling provenance and can do very interesting things. The overlap between linguistic and common sense knowledge—an insight from the Steven Pinker book club—is interesting. I have been reflecting on how for years you would sometimes hear data scientists ask what the value of formal systems really is. But when you look at what is happening with RAG today, ontologies and knowledge graphs seem to be fulfilling their original purpose: supplying common sense knowledge to AI agents. Large language models are proving to be very bad at common sense knowledge, and we have these beautiful architectures people have been working on.

00:27:12

Jack Kausch: It seems this is a natural fit. We just need to make sure that people hear about these solutions and that they are implemented.

Alan Morrison: I wonder what your personal experience has been working with the tooling. We have mentioned vector databases and graph databases, but relational databases are still predominant in enterprises, along with other database types. The question is: how can we create pathways and networks for agents to operate given this heterogeneous landscape? The graph is supposed to be the fabric that brings all of these together, but is it sometimes confusing to have this diverse set of tooling and repositories to make sense of it all?

00:28:25

Jack Kausch: It is a user experience problem that is not always appreciated: tabular data is easier and more intuitive for people to read, even though you can visualize a graph as a table. But we need graphs. I think it is inevitable for graphs to have that mediating role you are talking about because they are so efficient. That is absolutely going to happen. It is an open question how sophisticated the semantics of those graphs will need to be for some applications.

I think many people believe in semantics, and there are many amazing things you can do with semantics and reasoning; perhaps those are the frontiers we should be exploring.

00:29:20

Jack Kausch: In terms of specific databases that really integrate this, I have not seen one. I have been waiting for a vector database software that could integrate vector data with knowledge graphs and triple stores in a seamless way. Some talk about semantic technologies, but they do not go that deeply into them. I know there are a few others, but I think that is a real space where we still do not quite know what to do. It goes back to integrating the formal and the probabilistic. Anyone who solves that problem could potentially create quite a lot of value.

00:30:31

Alan Morrison: Speaking of that, I was just talking to Adam Kimball of Applied Graphs, who is a good knowledge engineer. He said what is compelling to him more recently is knowledge graph embeddings: embedding the graphs in vector databases.

Jack Kausch: Right. I wrote my master’s thesis on that, and yes, knowledge graph embeddings are fascinating. I have looked at many different embedding methods, and some of them get very sophisticated. I based the fuzzy logic aspects of the knowledge organization system and ontology I created on a wonderful ontology embedding method called Falcon.

Jen Way Tong at the University of Toronto created Falcon, which generates an explainable vector space that allows you to perform fuzzy Boolean algebras over it. This is a fascinating method. Lots of work was done by his supervisor, Robert Hondorf, who is an interesting person to talk with as well.

00:31:32

Jack Kausch: They have been doing a lot of interesting work at their lab in Saudi Arabia on embedding using mathematics, geometric methods, and methods using lattice theory. I think there is a lot that can be done with ontology embedding methods. The dream is the embedding method that has a perfect mapping between the geometry and the logic. Usually, that is not possible, or when it is, it is computationally intractable—it is too inconvenient if you are dealing with a large ontology, like a human phenotype or a pharmaceutical ontology, as it could take several years to embed. I use Owl2Vec a lot; that is my go-to algorithm.

00:32:30

Jack Kausch: My master’s thesis advisor developed it. That is what I got my training in, and it works. It is a Word2Vec model that includes the metadata. One useful approach is to have retrieval systems that play off each other: you use the formal side and extend it by looking at the informal embedding—the nearest neighbors in the space. If you take that loop too far, you get irrelevant hits, but on small scales, it can be very effective.

Alan Morrison: Does this relate at all to the Named Entity Recognition (NER) work you were doing as well? Maybe we could put this in the context of building out a graph and shaping the entities and relationships.

00:33:37

Alan Morrison: You seem to have a holistic perspective on NER, yet many people in the business space might only have exposure to one or two NER methods. There is a whole panoply of methods you can take advantage of.

Jack Kausch: Yes, there are a lot, and other Natural Language Processing methods can support NER, such as part-of-speech tagging or semantic role analysis. You can add in other methods, which may enhance the efficacy of your Named Entity Recognition depending on your use case.

There is always a rub with Named Entity Recognition: you are always going to have some errors. That is the most dangerous part of the analysis I found, because that is the point where the symbol grounding problem is most likely to fail.

[A bad grounding decision at the NER stage is insidious because it looks like clean, structured data — you have a label, you have a value — but the connection to reality is quietly wrong. Every subsequent operation (relation extraction, knowledge graph population, semantic similarity) treats that broken grounding as if it were solid.]

00:34:46

Jack Kausch: What worked for the mathematics project was taking the Named Entity Recognition method, whichever one you are using, with its set of classes, and mapping those classes to a property constraint or property in a SPARQL query within the external knowledge base. For example, if it is a person, then it must be a “foaf:Person” or something similar. Then you can avoid some of those challenges when querying the knowledge bases. It is about augmenting a fuzzy NER method with a more structured approach, which is more decidable but not always complete, and you cannot always know what to choose.

00:35:41

Jack Kausch: That approach worked. I was looking for ways to ensure that if you are looking someone up, they must have an ID from the Mathematics Genealogy Project, which is easy to do on Wikidata, as there is a property for that. That is what makes Wikidata so wonderful.

Alan Morrison: It is a huge resource. The ontologists I have spoken with recently use a limited set of “golden” relationships in their consulting work that they feel they need to specify. If you are beholden only to the probabilistic side of things, you might get an explosion of irrelevant relationships. It seems like you want a definite context, which implies the relationships you will use, and then NER becomes an approach using that subset of things to arrive at an answer. Is that a good way of saying it, or am I wrong?

00:37:11

Jack Kausch: Exactly, I think so. Of course, that does not mean you will never have errors; there will still be some misattribution, but it definitely lowers the error rate by immense orders of magnitude.

Alan Morrison: It seemed like you had a critique of conventional or typical Named Entity Recognition methods. I was wondering if there was an alternative approach that we had not mentioned yet that comes to mind.

Jack Kausch: Well, I have not talked about one important aspect of my work, which is the multilingual aspect of the ontology I was creating. This notation system was influenced by some of the ars combinatoria ideas from Leibniz, Dalgarno, and Wilkins. They thought they could create a notation system understandable in every language, which I do not think is possible. However, I did try to find semantic primitives that could be used to translate between languages.00:38:11

Jack Kausch: The issue with NER is that it is a machine learning method, so it must be trained on a corpus of data. Consequently, some languages will have better support than others, which is also true of machine translation and all these methods. What can you do? You can use pivot languages or thesauri to help you make mappings between different languages, such as multilingual thesauri or multilingual information retrieval pivot languages. Those can help you map between the latent space of one language and another. However, there are always these qualitative issues—it goes back to what we were talking about at the beginning—because the inner world of a language is different.

00:39:11

Jack Kausch: That is why it is so hard to translate between languages; people often say that the sense of a word they know in Portuguese is untranslatable, even with five English words. Those kinds of problems will always be there. I heard about a really interesting, spooky algorithm called VecToVec, which could take a vector database and, essentially, hack into it. It translates from a space where you know the semantics and entities, randomly translates into the second space, and then re-translates back into the first.

I do not fully understand what this algorithm is doing, but I know people were able to reconstruct the documents that a vector database was trained on, which was a huge security risk for companies using that vector database.

00:40:16

Jack Kausch: That was very interesting. Approaches like that may be emerging now where you could find some of those associations even without a pivot language. However, I think a lot of meaning is lost when you do that kind of Procrustean bed analysis.

Alan Morrison: Your pictographic work seems very familiar in some ways to what others have tried to do. People seem to want to get to that layer of abstraction. What could the payoff be for a method like the one you came up with? If you are able to show it today, perhaps people would be interested in it.

Jack Kausch: You mean sharing my screen? Sure, I can share my screen.00:41:15

Jack Kausch: I am going to share the entire screen and go to my website. Can you see this, Alan?

Alan Morrison: Yes, it seems clear to me.

Jack Kausch: These are the pictograms in the ontology, designed to be as understandable as possible cross-linguistically. As you said, Alan, this is an old idea. Umberto Eco wrote a whole book about it called The Search for the Perfect Language, describing people who tried to create pictographic metadata systems that everyone from all languages could understand. I do not think that is possible or feasible, but I think it is interesting to create highly cross-linguistic pictograms regardless. Many of the other ideas I took were from people like Ramon Llull, who inspired Leibniz and some of the mathematics he tried to create in his doctoral dissertation on the Art of Combinations.

00:42:29

Jack Kausch: The idea is that you can combine different glyphs to create a semantic triple, like “moon, mountain, and egg,” and it will recommend the most similar Wikipedia articles to that triple. You can also switch your language, which is where this pivot language approach becomes very interesting. If you put it in Dutch, you will sometimes find that you get different Wikipedia articles than you might get in English. This can occur for many reasons. I use an algorithm called Wikipedia2Vec to embed the data. When you use that algorithm, you download and embed the whole of Wikipedia in a given language. Sometimes the indexing method might pick up on different things, and sometimes biases might creep into a given language’s data set.

00:43:34

Jack Kausch: Maybe German will give us a better result for this one. If you look here, moon is our top result for this triple. The value of this—and this is a very experimental version—is interesting. I am working on building much more interactive versions. This is still an “Easter egg” on the website, but you can create your own wheels and add your own glyphs, which would be the more interactive future versions where you could drill down into the data. Right now, this is the only one at a stage where people can use it, and it is more experimental, just playing with these ideas.

00:44:26

Jack Kausch: The interface represents the structure of an ontology. These things are put in opposition to each other using a fuzzy complement and a fuzzy intersection relation. I have a way of defining everything on the outer circles as being a subclass of things on the inner circle. The slider controls the extent to which you are focusing on just your combination. If you are at 0% serendipity, you are looking only at the combination between “moon,” “fire,” and “tree” in this language. If you put it at 100% combination, you are suddenly looking at the emergent results that come out of the ontology’s fuzzy structure. Because 14 different classes are combined with fuzzy semantics, you get things that appear very random, which is why I call it serendipity.

00:45:29

Jack Kausch: It can be interesting to see how your results change depending on the percentage you set the slider at. If you want more random results, you set it higher, and then it is not just the combination of glyphs giving you a recommendation. This is a rough prototype of how we can create a functioning semantic ontology indexed to vector space with a user interface, which is very inspired by the search Umberto Eco describes in his book. I also wanted to create a user interface out of Ramon Llull’s Ars Magna. For those unfamiliar, Ramon Llull was a thirteenth-century Catalan mystic who created many of these paper computational engines for combining ideas, which look like real wheels you can see in books.00:46:34

Jack Kausch: I wondered if it was possible for me to make that a user interface. In its most basic form, this is exploring an angle of rotation in a cosine similarity space that has been scaled up to 360 degrees, so it is very transparent. When you are doing more semantic things, it is a bit more obscured. I had other earlier prototypes, like this notebook, where you can build your own string by typing. For example, the letter ‘A’ makes the star glyph in the font. If you turn the dial, you are spinning the star around the circle, and the further around the circle you get, the less star-like the results are. At 136 degrees, you are still seeing star-like entities such as Star Trek and “astronomer,” but once you go further into the third and fourth quadrants, things become very odd and unrelated.00:47:40

Jack Kausch: That is the technology; what is going on behind the scenes is that it is looking at this cosine similarity matrix and giving you a way to visualize it. I also wanted to add some of these fuzzy semantics. A friend of mine did economics with this. With the exemplar theory, you can simply associate each glyph with a Wikipedia article about economics in a way that makes sense to you. He metaphorically decided that liquidity was water. When you combine the glyphs, you are combining concepts in economics and getting results back. If you set it to the economics subject, you see these recommended articles based on his internal schema when you combine “sun,” “mountain,” and “leaf.”

00:48:35

Jack Kausch: Of course, you could put any ontology on here; it does not have to be glyphs. You could put terms from a product ontology like GIST, see what combinations you get in your space, and then switch these different interpretive lenses. It is all very experimental. I am hoping to make versions that will be more interactive and allow people to really play with the data and go deeper into the Wikidata elements soon.

Alan Morrison: You mentioned GIST, Semantic Arts’ upper ontology. Alan Michael has been working with Semantic Arts and has created a much more accurate alternative for industry classification, I think called IndustryKG (https://www.industrykg.com/). Your user experience reminds me of that classification scheme and how what you have done might be useful in some ways.

00:49:54

Alan Morrison: It is a kind of creative discovery when you bring these things together, isn’t it?

Jack Kausch: Exactly. I want to explore browsing, which is why I chose Wikipedia because it is a place where people browse. I want to see if we can create more public internet experiences that are almost like web art where people can go to learn things, which I think is needed publicly right now. All of this is applicable in industry and could be used for knowledge management in many different ways.

Alan Morrison: It gets back to the qualitative approach to starting things correctly and asking the right questions. Alan Michaels came up with an alternative to NAICS (North American Industry Classification System), which is universally used but inadequate for comparing apples to apples in industries as an analyst. You could come up with something from a user experience perspective that could be powerful for spelling out and using better, ontology-based classification systems, but also starting with the right questions about what an industry is.

Alan Michaels took Michael Porter’s Five Forces, the Harvard Business School professor, and started his classification system from those first principles. There are many powerful ways of thinking about questions in different ways.

00:52:19

Jack Kausch: Absolutely. One of the things I liked about this browsing approach is that it is about searching for information when you do not necessarily know what you are looking for. The tool I made has taught me new concepts, like in biology, all the time. Since there are living beings on the system, it picks up many things about biological taxonomies. I suppose that is just the data set I used. Another interesting thing is that when you create these browsing user experiences, you generate new knowledge through discovery. I was excited to design that in a digital space because it can be difficult for us to do. In traditional record situations or a library, it is easy to find something you do not already know you are looking for, but online, that is not always as easy. I think this can help us to ask those more qualitative questions.

00:53:12

Alan Morrison: You have finished up your PhD. What does the rest of 2026 and 2027 look like for you? What are you going to do next?

Jack Kausch: Well, I am taking a vacation and then attending some conferences, but I am on the job market right now. I am definitely interested in enterprise ontology jobs or applying approaches like these. If anyone is interested in working with me on that front, please reach out. I can send you my email. That is where I am right now—searching the world.

Alan Morrison: You are definitely on LinkedIn; that is where we discovered each other. Why don’t you tell us the name of the main site you brought up today?

00:54:11

Jack Kausch: The site is realcharacterlanguage.world. Those three words together: realcharacterlanguage.world. You can also contact me at my institutional email at Western University, and I am on Blue Sky. I am a real fan of Blue Sky as a social network—a very fun place.

Alan Morrison: It is a fun place to follow people, too. Jack, this has been really fun and interesting to talk to you. I am glad we had the chance to do this.

Jack Kausch: Thank you, Alan. Likewise, this was awesome.

Alan Morrison: Good luck to you, and I hope you stay in touch. Folks, please reach out to Jack. I think what you have been doing is inspiring, and it is great that people in academia can take this broad perspective on how to bring things together to make more sense of this confusing AI world.

Jack Kausch: Thank you so much, Alan. It has been great talking with you today. I hope to talk again soon.

Leave a ReplyCancel reply

Recent Posts

Categories

Trending