Data Quality

Data Lineage

Data lineage is the traceable history of where data came from, how it was transformed, and how it reached its current form.

Allocator relevance: Supports trust, auditability, and dispute resolution by making claims traceable to sources and transformations.

Expanded Definition

Lineage answers: What is the source? When was it observed? What rules or transformations were applied? In allocator intelligence, lineage is critical because many attributes are derived (entity resolution), inferred (mandate classification), or updated (change detection). Without lineage, users can’t judge reliability or resolve conflicts.

Lineage also enables governance: it makes it possible to identify which pipelines produced an error and to correct systematically.

How It Works in Practice

Modern systems store source references, timestamps, confidence scores, and transformation steps. When a record changes, lineage retains the prior state and the reason for change (new source, reclassification, deduplication, etc.).

Decision Authority and Governance

Governance defines what lineage must be recorded, how evidence is stored, and how user-facing confidence and verification statuses are generated. Lineage also supports compliance and privacy constraints by documenting allowed sources and transformations.

Common Misconceptions

  • Lineage is only relevant for engineers.
  • A source link alone equals lineage.
  • Lineage guarantees accuracy (it supports verification; it doesn’t prove correctness).

Key Takeaways

  • Lineage makes claims explainable and auditable.
  • It is central to trust in OSINT-driven datasets.
  • It enables systematic correction when errors occur.