Build an Ontology: The Competitive Advantage Hiding in Your Data | Blog

The Problem

Your data is a mess, and it’s costing you more than you think.

The symptoms are everywhere. The LLM you bolted onto your knowledge base confidently tells people the wrong thing. Two developers build the same concept three different ways because nobody agreed on what a “customer” actually is. A number in one report doesn’t match the same number in another, and nobody can say which one is right. Every new feature takes longer than the last because the system underneath it isn’t organized — it’s just accumulated.

None of these feel like the same problem. They are. Every one of them traces back to a single root cause: your data has no shared, explicit model of what things are and how they connect. You have data. You do not have meaning.

What We Learned

We learned it the hard way, by climbing the whole ladder before we found the top rung.

The first thing everyone tries is the pile. Collect all your documents into one spot, point a RAG model at it, and let the AI figure it out. It demos well and falls apart in production. A retrieval model over an undifferentiated pile gives you plausible answers, not correct ones, because there’s no structure telling it what’s authoritative or how two facts relate.

So you get more disciplined. You break the pile into a real database — normalized, relational, clean. Better. But a schema tells you a customer row can join to an order row. It doesn’t tell you what a customer is, what makes one different from a lead, or which business rules govern the relationship. The meaning still lives in people’s heads.

So you go tighter. You define every field, document every column, nail down every type and constraint. Now you have a well-documented database — and you’re still watching developers make the same modeling mistakes and the AI still guesses wrong on anything that spans more than one table. Tight fields describe columns. They don’t describe the domain.

It wasn’t until we built an actual ontology that the whole thing snapped into place. I have a master’s degree in distributed knowledge management, so I’ll admit I was primed to reach for this — but the reason it works isn’t academic. An ontology is a formal, explicit description of the concepts in your domain, their properties, and the rules that constrain how they relate. It captures the meaning a schema leaves in people’s heads and makes it machine-readable. That is the exact thing a database, no matter how clean, never gave us — and the exact thing an LLM cannot reliably infer from a pile on its own.

What You Can Do About It

Build the ontology deliberately. Don’t try to boil the ocean — the canonical playbook here is Noy and McGuinness’s Ontology Development 101 out of Stanford, and it’s an iterative, seven-step process. We use it as our starting point on every engagement:

Determine the domain and scope. Write down the questions the ontology has to answer — the “competency questions.” If it can’t answer them, it isn’t done, and if a question is out of scope, you’re not building it.
Consider reusing what exists. You are almost never the first person to model your domain. Check for existing ontologies before you invent your own vocabulary.
Enumerate the important terms. List every concept in the domain, plainly, without worrying yet about how they organize. Nouns become candidate classes; verbs become candidate relationships.
Define the classes and the hierarchy. Organize those terms into an is-a hierarchy — top-down, bottom-up, or both. This is where “a lead is a kind of contact” stops being tribal knowledge and becomes structure.
Define the properties of each class. What describes a customer? What describes an order? Attach each property at the right level of generality so it’s inherited correctly.
Define the facets of those properties. Cardinality, value types, allowed ranges. This is where the rules that used to live in validation code or someone’s memory become part of the model itself.
Create instances. Populate it with real examples and pressure-test it against your competency questions.

Two rules matter more than the steps. First, there is no single correct ontology — there are better and worse ones for your application, so design for the questions you actually need to answer. Second, it is iterative. Your first version will be rough. That’s expected. You refine it as the domain teaches you where you were wrong.

The full guide is worth reading end to end — we keep a copy on hand: Ontology Development 101 (PDF), or read the original from Stanford.

Why an Ontology Matters

When your domain has an explicit model, everything downstream gets faster and more correct.

Your AI stops hallucinating relationships because the relationships are written down and machine-readable — the model reasons over your ontology instead of guessing over a pile. No current LLM, on its own, links your data as precisely as a purpose-built ontology of your domain does, because the LLM was never taught what your business means by its own words. You give it that.

Your developers stop reworking, because the model is the shared source of truth and there’s one right way to represent a concept. Your numbers reconcile, because “revenue” means one thing everywhere. And you move fast — not despite the structure, but because of it. A well-modeled domain is the thing that lets you add the tenth feature as quickly as the first.

That’s why we treat ontology development as a competitive advantage, not a nice-to-have. Most of your competitors are still stacking data into a pile and hoping an LLM sorts it out. The ones who model their domain build a moat: cleaner AI, faster teams, numbers they can trust. It’s harder than dumping everything into a vector store — and that difficulty is exactly why it’s an advantage.

At Periscoped, we help companies turn a pile of data into a modeled domain — so your AI tells the truth, your team moves fast, and your data finally works for you instead of against you.