Ontology Definition and Its Relationship to Knowledge Graphs
Deep Narrative Analysis (DNA) makes extensive use of ontology and knowledge graph technologies. Unfortunately, these topics are not well understood. In fact, there are entire books dedicated to these subjects, as well as multiple 10+ page papers. blog posts and web sites. But those definitions can be complicated and not very meaningful to IT and business people. This post is an attempt to provide simple definitions.
An ontology can be specified as:
A description of the kinds of things and relationships in a topic area, specified in a formal way, and created by a community of users for an explicit purpose
Unpacking the definition, it is important to highlight that:
- One of the most important goals of an ontology is to communicate the concepts and knowledge (and increase the understanding) of the topic area within the "community of users"
- This enables sharing and reuse of the knowledge encoded using the ontology
- The "description" requires understanding and detailing the topic area as simply as possible, while still addressing the "purpose" of the ontology
- Note that the topic area being described can be anything from philosophical concepts, to general events, to drug types, to a categorization of types of rainfall for a very specific application area
- Although distinctions of top-level versus application ontologies can be made, they are all just types of ontologies created for an "explicit purpose"
- Defining an ontology in a "formal way" does NOT mean that it uses a complex logical form, a programming or standard language (such as RDF or JSON), or specific tooling
- What is required is a mechanism to organize the descriptions, which can be accomplished using definitions and statements written in natural language, AND/OR by using a computational form
- Defining an ontology using a machine-processable language is necessary ONLY IF the ontology is to be consumed by both humans and machines
Now that we have a definition of ontology, let's apply it to understanding a knowledge graph. But, this first requires defining a "graph". A graph is:
A set of "nodes" (vertices) that are connected by labeled or unlabeled “edges” (relationships or associations)
Finally, we can define a "knowledge graph". It is:
A collection of nodes representing specific instances of the types of things in the domain of interest and property values (such as string or integer values), interconnected by named edges (identifying the properties and relationships). The structure of the knowledge graph is defined by an ontology, which describes the semantics of the nodes and edges. "Knowledge" is created from specific data that is encoded according to the ontology.
This is what distinguishes knowledge encoded using relational databases from knowledge encoded using ontologies and graph databases. Ontologies define concepts and their semantics, which in turn allows mappings and the fusion of data across those concepts. Fused data may represent something that is totally new or that is semantically equivalent. A new concept may be a generalization or specialization of an existing one, or may be associated in some other way. This flexibility is possible because of the connectedness of ontologies and graphs (making it possible to define new types of nodes and edges) and the power of sub-classing concepts or properties.
With sub-classing, you do not even have to know the detailed semantics of a new concept ... Simply knowing that one concept is a specialization of another (or is disjoint/distinct from another) provides important information. Further, concepts do not have to be related by complex foreign keys and table joins, they can be associated via a flexible series of relationships that are specified using a graph query language such as SPARQL.
Knowledge graphs enable analysis of the data as an integrated whole. When implemented using graph database tooling, they combine features of standard, relational databases (e.g., structured query), with the flexibility and extensibility of graphs. In addition, graph analysis (such as centrality, clustering and associativity analyses) can be performed on the integrated data. Backing the graph by an ontology provides formal definition and structure, which provides for increased precision and accuracy when fusing and integrating data from various sources.
In the past, other terminology has been used to define the concepts in a knowledge graph. For example, you can reference the statements in the backing ontology as a T-Box ("terminology") and the encoded data as an A-Box ("assertions").
Knowledge graphs are not new, but they are very flexible and useful!