Asset Manager

Updated:

Innodata

Innodata evolved from a 1988 document shop into a public provider of AI training data for large language models, led by CEO Jack Abuhoff.

Innodata

Innodata was founded in 1988 as a document conversion and imaging business, initially riding the wave of digitizing physical records. The company went public on the NASDAQ in 1993, and for two decades operated as a back-office business process outsourcing firm, handling content enrichment and data preparation for publishing, legal, and medical clients. Over time, the company expanded its footprint into digital publishing and content analytics, but the foundational pivot came under continued leadership from Jack Abuhoff, who remains CEO more than three decades after taking the helm. The firm now operates as a specialized AI data engineering company, with a core revenue stream derived from curating, annotating, and evaluating the massive training datasets required to build frontier large language models. Its work spans data annotation, supervised fine-tuning, model evaluation, and domain-specific data creation. The company has publicly disclosed relationships with multiple high-profile technology companies building generative AI platforms, including contracts with a major customer identified by analysts as Microsoft, which alone represented a significant portion of revenue in 2023 and 2024 (per the firm's SEC filings, 2024). The operational backbone runs through its largest delivery center in the Philippines, with additional capacity in India, supplying 24/7 pipelines for data work across English and several dozen other languages. The company's revenues reflect a dramatic era shift: after years of anemic top-line growth, revenue surpassed $170 million in trailing twelve months through early 2025, driven almost entirely by the generative AI data services segment. The firm's workforce scaled quickly to meet demand, particularly within its Philippines and India operations, though specific total headcount is not fixed. May 2024: Innodata announced the launch of its LLM evaluation and red-teaming platform within its newly formed generative AI unit, extending its service chain from data creation into model safety and performance auditing (per the firm, May 2024). The company maintains its Ridgefield Park headquarters but operates primarily through its international delivery centers and a small footprint in London. Innodata's structural differentiation lies in being a publicly listed subcontractor to the core AI infrastructure stack without being a model builder itself. It occupies a neutral position — training and testing models for multiple, potentially competing, foundation-model labs without developing its own AI products. This third-party independence, combined with three decades of content-adjacent operational history, makes it a pure-play public proxy for the data preparation layer beneath the generative AI boom, rather than a pick-and-shovel bet on any single platform winner.

General information

Firm type

Asset Manager

Year founded

1988

AUM

Undisclosed

Location

Region

North America

Country

United States

City

Ridgefield Park

Corporate office

Ridgefield Park, NJ, United States

Additional offices

Noida, India · Cebu City, Philippines · London, United Kingdom

Principals

Jack Abuhoff

CEO and President

Marissa Espineli

CFO

Amy Agress

Vice President and General Counsel

Sector focus

AI/MLEnterprise SoftwareMedia & Entertainment

Frequently asked questions

How did a decades-old BPO company become an AI infrastructure supplier?

Innodata's core operational DNA — large-scale human data annotation, annotation management platform, and multi-shore delivery centers — turned out to be directly transferable to the dataset creation required for large language model training. Instead of annotating legal documents for publishers, the same workforce now annotates text, evaluates model outputs, and conducts red-teaming for AI labs. The company invested in proprietary data-annotation platforms and framework-agnostic tooling that sits between raw data and model training pipelines, effectively industrializing the data layer.

Who are Innodata's primary customers for generative AI data services?

Innodata has disclosed that a single large technology company represented a substantial majority of its revenue in recent periods, widely identified by equity analysts and financial media as Microsoft (per SEC filings, 2024). The company also has contracts with other enterprise software and cloud providers building large language models, though specific names remain subject to confidentiality restrictions. Revenue concentration risk remains one of the key structural concerns flagged by public market investors.

Is Innodata building its own AI models or solely providing data services?

The company is not building foundational AI models and has not announced any intention to compete with its customers in that space. It operates firmly as a neutral data engineering partner — collecting, preparing, annotating, and evaluating training data as well as auditing model outputs. This independence allows it to serve multiple clients simultaneously while avoiding conflicts with proprietary model work.

Where does Innodata's AI data preparation work get done operationally?

The bulk of data annotation, model evaluation, and red-teaming work runs through Innodata's large delivery centers in the Philippines and India, with additional multilingual capabilities supported by teams in London and other locations. The company has historically maintained a 24-hour operations cycle with shifts passing between Noida, Cebu City, and New Jersey.

What investment stages or deal structures does Innodata participate in as a public company?

Innodata does not operate as a family office or private capital allocator. As a publicly traded small-cap company (NASDAQ: INOD), its growth strategy centers on organic expansion of its AI data services contracts and, historically, occasional small strategic acquisitions of complementary technology platforms rather than portfolio-company investments. Institutional investors access the firm exposure via the public equity.

Profile maintained by using OSINT (open-source intelligence), regulatory filings, licensed data partners, and verified direct submissions. Read the methodology. Last updated: . Continuous refresh with full update cycles at least every 30 days.

Need institutional-grade insight on family offices?

Altss delivers:

Principals with verified direct contactsAllocation history by asset classOSINT-derived deal signals
Book a demo

Prefer a guided tour?

We’ll walk you through:

Interactive funding timelinesCustom mandate & allocation filters
Book a demo