Skip to main content

Ontology-Driven Architecture for Compliance Software

Ontology-Driven Architecture for Compliance Software

At Cohera, we connect pharmaceutical quality systems that were never designed to work together. Veeva Vault stores documents one way. SAP QM tracks materials another way. TrackWise manages CAPAs its own way. Each system has its own schema, its own identifiers, its own assumptions.

The challenge isn't just moving data between systems—it's creating a coherent understanding of what that data means.

This is where ontology-driven architecture comes in.

What Is an Ontology in Software?

In philosophy, ontology is the study of what exists. In software, an ontology is a formal description of the concepts in a domain and the relationships between them.

For our purposes, an ontology defines:

  • Objects: The things that exist (Suppliers, Materials, Certificates, Products)
  • Properties: What we know about those things (name, status, expiry date)
  • Relationships: How things connect to each other (Supplier supplies Material, Certificate covers Material)
  • Constraints: What must be true (every Certificate must have an expiry date, every Material must have exactly one primary Supplier)

This isn't just database schema design. It's a semantic model that captures the meaning of data, independent of how any particular system stores it.

Why Ontologies Matter for Integration

When you integrate multiple systems, you face a fundamental problem: each system has its own view of reality.

Veeva Vault sees documents:

Document {
  id: "DOC-123",
  type: "CoA",
  supplier: "SUP-456",
  status: "Approved"
}

SAP QM sees materials:

Material {
  number: "MAT-789",
  description: "Sodium Chloride USP",
  vendor: "V000123"
}

TrackWise sees quality events:

CAPA {
  id: "CAPA-001",
  affected_material: "MAT-789",
  source: "Supplier Audit"
}

These systems don't share identifiers. They don't agree on terminology. They weren't built to understand each other.

An ontology creates a layer above these systems that captures what we actually care about:

Supplier (SUP-456 = V000123)
├── supplies: Material (Sodium Chloride USP)
├── has_document: Certificate (CoA, DOC-123)
└── related_to: CAPA (CAPA-001)

Now we have a unified model where a Supplier is a Supplier, regardless of whether we're looking at Veeva, SAP, or TrackWise.

The Cohera Ontology

Our ontology captures the pharmaceutical quality domain. Here are some key objects:

Supplier

  • Properties: name, status, qualification_date, risk_tier
  • Relationships: supplies Materials, has Contacts, has Certificates, has Documents

Material

  • Properties: name, description, category, specification
  • Relationships: supplied_by Supplier, used_in Products, covered_by Certificates

Certificate

  • Properties: type, issue_date, expiry_date, status
  • Relationships: covers Materials, issued_by Supplier, stored_in Document_System

Product

  • Properties: name, SKU, registration_status
  • Relationships: contains Materials, has Documents, subject_to Specifications

Quality_Event

  • Properties: type (CAPA, Deviation, OOS), status, due_date
  • Relationships: affects Materials, involves Suppliers, documented_in Documents

Schema Evolution and Versioning

Ontologies evolve. When we add new object types or relationships, we need to handle this carefully:

Backward compatibility: Existing data must remain valid. New required properties need defaults.

Forward compatibility: New versions should handle data created by older versions.

Migration paths: Clear procedures for updating the ontology without breaking existing integrations.

We version our ontology like software:

ontology_version: "2.3.0"
- Major (2): Breaking changes to core objects
- Minor (3): New objects or relationships (backward compatible)
- Patch (0): Clarifications or documentation updates

Mapping Systems to the Ontology

For each connected system, we maintain mappings that translate between system-specific schemas and our ontology.

Veeva Vault mapping:

system: veeva_vault
object_mappings:
  Certificate:
    source_type: 'document'
    source_subtype: 'coa__c'
    property_mappings:
      expiry_date: document_expiry_date__c
      status: lifecycle_state__v
      covers: related_material__c

SAP QM mapping:

system: sap_qm
object_mappings:
  Material:
    source_table: MARA
    property_mappings:
      name: MAKTX
      category: MTART
      supplied_by: source_from_EORD_table

These mappings are configuration, not code. When a customer has customized Veeva fields, we update the mapping without changing the integration logic.

Query Language

With a unified ontology, we can offer powerful cross-system queries:

"Show me all materials from suppliers with expiring certificates"

In our query language:

Supplier
  .where(certificates.any(expiry_date < today() + 90.days))
  .materials
  .include(supplier.name, supplier.certificates.expiry_date)

This query traverses relationships across Veeva (certificates), SAP (materials), and potentially other systems—all transparently.

"What products are affected by this supplier issue?"

Supplier("SUP-123")
  .materials
  .products
  .include(name, registration_status, materials.name)

This is the kind of query that would take hours to answer manually, requiring export from multiple systems and manual reconciliation.

Audit Trails and the Ontology

In regulated environments, we need to track not just current state but history. Our ontology includes temporal aspects:

Every object has audit metadata:

Material {
  id: "MAT-789",
  name: "Sodium Chloride USP",
  // Current values above, audit trail below
  _created_at: "2024-01-15T10:30:00Z",
  _created_by: "user:alice@pharma.com",
  _modified_at: "2024-06-20T14:22:00Z",
  _modified_by: "user:bob@pharma.com",
  _version: 3,
  _change_reason: "Specification update"
}

Relationships are timestamped:

supplies {
  supplier: "SUP-123",
  material: "MAT-789",
  _valid_from: "2024-01-15",
  _valid_to: null,  // Current relationship
  _created_by: "system:sap_sync"
}

This lets us answer questions like "who was the supplier for this material on March 15th?" even if the supplier has changed since.

Handling Conflicts

When multiple systems are sources of truth for different aspects of an object, conflicts arise. Our ontology defines ownership:

Material:
  authoritative_sources:
    name: sap_qm # SAP is authoritative for material name
    specification: veeva_vault # Veeva is authoritative for specs
    supplier_info: supplier_portal # Supplier portal for supplier data
  conflict_resolution:
    default: authoritative_source_wins
    alert_on_mismatch: true

When we detect a conflict—say, a material name differs between SAP and a supplier portal—we log the conflict, use the authoritative source, and alert users for review.

AI and the Ontology

Our AI agents work directly with the ontology. When the certificate intake agent processes a document, it:

  1. Extracts data from the document (using ML models)
  2. Maps extracted data to ontology objects and properties
  3. Identifies relationships (this certificate covers these materials)
  4. Creates or updates ontology objects
  5. Propagates changes to connected systems via mappings

The ontology provides the AI with:

  • Structure: What kinds of things exist and how they relate
  • Validation: What values are valid for each property
  • Context: What relationships should exist

Without the ontology, the AI would just be extracting fields. With the ontology, the AI understands the domain.

Implementation Considerations

Storage: We store the ontology in a graph database (we use a combination of PostgreSQL for attributes and a graph layer for relationships). The graph structure makes relationship traversal fast.

Caching: Common queries are cached with invalidation tied to object updates. Cache invalidation follows relationship paths—updating a Supplier invalidates cached queries involving that Supplier's Materials and Certificates.

Performance: For large pharmaceutical companies, we're dealing with tens of thousands of suppliers, hundreds of thousands of certificates, and millions of relationship edges. Index design and query optimization matter.

Multi-tenancy: Each customer has their own ontology namespace. While the base ontology is shared, customers can extend it with custom objects and properties.

Lessons Learned

Start with the domain, not the systems. We initially tried to build the ontology by generalizing from system schemas. This led to a model that was too tied to existing system structures. When we stepped back and modeled the domain directly, we got a cleaner, more useful ontology.

Relationships are as important as objects. Early versions had rich object definitions but thin relationship modeling. This made cross-system queries awkward. We invested heavily in relationship semantics and it paid off.

Plan for extension. Pharmaceutical compliance is complex. We kept discovering new object types and relationships. Building an extensible ontology from the start saved us from painful migrations.

Document the semantics. Every object and property has documentation explaining what it means, not just what type it is. This documentation is essential when onboarding new team members or discussing requirements with customers.

The Payoff

Ontology-driven architecture adds upfront complexity. It takes time to design a good ontology, implement the mapping system, and build the query engine.

But the payoff is substantial:

  • Faster integration: Adding a new source system is configuration, not code rewrite
  • Powerful queries: Questions that span systems become trivial to answer
  • AI readiness: AI agents have a semantic foundation to work from
  • Future-proofing: When systems change, we update mappings, not architecture

For anyone building software that connects multiple enterprise systems, I'd strongly recommend investing in semantic modeling early. It's one of the best architectural decisions we made at Cohera.