OSINT Methodology

Identity Disambiguation

Identity disambiguation is the systematic process of distinguishing between entities with similar names or attributes—using unique identifiers, corporate registries, and relationship networks—to prevent targeting errors, duplicate records, and misattributed intelligence.

Identity disambiguation is the systematic process of distinguishing between entities with similar names, locations, or attributes—using unique identifiers, corporate registries, relationship networks, and cross-source validation—to prevent targeting errors, duplicate records, and misattributed intelligence.

Without disambiguation, you're creating data pollution. "ABC Capital" could be ABC Capital Partners (SF venture), ABC Capital Management (NYC hedge fund), or ABC Capital Advisors (London family office). John Smith at "Greenfield Partners" might be John Smith (CIO, NYC office) or John Smith (analyst, London office). With disambiguation, you're using unique identifiers: EIN/tax ID, corporate registry number, domain verification, physical address, key person linkage.

This is a targeting precision issue. Misidentification wastes outreach (wrong firm), creates duplicate records (same entity entered twice), and damages credibility (LPs notice when you confuse them with another firm).

How allocators define identity disambiguation risk drivers

Teams structure disambiguation through:

  • Stage 1 - Initial flagging: System flags potential duplicates based on name similarity (Levenshtein distance <3), similar location (same city), overlapping contacts
  • Stage 2 - Unique identifier search: Look for EIN (US), corporate registry number (UK Companies House, Luxembourg RCS), domain verification (confirmed website)
  • Stage 3 - Network validation: Cross-reference key people (principals, CIOs) and their other affiliations to confirm entity distinctness
  • Stage 4 - Address verification: Physical office address (not just mailing address) confirms separate entities or confirms same entity
  • Stage 5 - Merge or split decision: If same entity → merge records and preserve provenance; if distinct → mark as separate, flag for ongoing monitoring
  • Decision rules: Same entity if matching EIN/registry + same domain + same address + overlapping key people; Distinct if different EIN/registry + different domain + different address + no shared key people
  • Evidence phrases: "entity disambiguation," "unique identifier," "corporate registry," "EIN verification," "duplicate detection," "entity resolution"

Allocator framing:
"Are we certain this is the correct entity—or could we be confusing similar names and wasting outreach?"

Where it matters most

  • common firm names with multiple entities (ABC Capital, XYZ Partners)
  • geographic expansion where single family operates multiple entities
  • personnel tracking across firm changes (spinoffs, rebrands)
  • M&A situations requiring entity lineage tracking

How it changes outcomes

Strong disambiguation discipline:

  • prevents wrong-firm outreach (targeting ABC Capital NYC vs ABC Capital SF)
  • eliminates duplicate records (same entity entered twice under variant names)
  • preserves relationship capital by demonstrating research precision
  • enables accurate entity tracking through rebrands and restructures
  • protects data quality by flagging ambiguous entities for review

Weak disambiguation discipline:

  • embarrassing targeting errors (confusing two different firms)
  • data pollution from duplicate records
  • missed opportunities (treating rebranded entity as new firm)
  • LP perception of poor research quality
  • wasted effort re-researching same entities

How allocators evaluate disambiguation discipline

Confidence increases when teams:

  • show systematic unique identifier verification (EIN, registry number, domain)
  • document disambiguation logic (why entities deemed same vs distinct)
  • flag uncertain cases for manual review rather than guessing
  • preserve entity history through rebrands and restructures
  • demonstrate network validation (key person tracking)

What slows decision-making

  • assuming name similarity = same entity without verification
  • no systematic unique identifier lookup (guessing based on name alone)
  • treating variant names as distinct entities without investigation
  • missing entity rebrands and creating duplicate records
  • no documentation of disambiguation decisions

Common misconceptions

"Similar names = same entity." → Verify with unique identifiers before assuming.
"Different domains = different entities." → Could be subsidiary or rebrand; check ownership.
"One source confirms identity." → Disambiguation requires multi-source triangulation.

Key allocator questions during diligence

  • What unique identifiers do you use for entity disambiguation?
  • How do you handle name variants (ABC Capital vs ABC Cap)?
  • What is your process for detecting and merging duplicates?
  • How do you track entities through rebrands and restructures?
  • What manual review process exists for uncertain disambiguation cases?

Key Takeaways

  • Identity disambiguation uses unique identifiers (EIN, corporate registry, domain) to distinguish between similarly-named entities and prevent targeting errors
  • Workflow: flag potential duplicates → search unique identifiers → validate via network/address → merge or split decision
  • Common scenarios: name variants, geographic branches, rebrands, spin-offs, multi-entity structures—each requires different disambiguation logic