Updated:
Groq
Groq emerged from stealth in 2016 when Jonathan Ross, the engineer who designed the Tensor Processing Unit inside Google, walked away from Alphabet's...
Groq
Groq emerged from stealth in 2016 when Jonathan Ross, the engineer who designed the Tensor Processing Unit inside Google, walked away from Alphabet's hyperscale advantage to build a clean-sheet chip for inference. The firm is neither a foundry nor a GPU clone — it vertically integrates silicon, compiler, and cloud API from its Mountain View headquarters. The chip, the Language Processing Unit (LPU), arranges memory and compute in a deterministic linear chain, trading the general-purpose flexibility of a GPU for hard, repeatable speed on structured linear algebra. The origin story is inseparable from Ross's conviction that the original TPU's inference-first architecture was undervalued inside a search-advertising giant. Groq's strategy sits entirely on inference, not training. The LPU targets production workloads where per-token latency, variance, and total cost of ownership determine whether an AI product ships. The firm offers a fully-managed cloud API, GroqCloud, that exposes raw throughput to external developers; early adopters assemble large language models such as Llama, Mixtral, and Gemma into deterministic pipelines. The company does not build models, train foundation weights, or compete with its existing customer base. Instead it runs third-party open-source and proprietary architectures at 300+ tokens per second on Llama-3.1-70B, a throughput figure that dominates published benchmarks. Customers span end-user chatbots through enterprise RAG systems. Critically, Groq's compiler — not CUDA — maps model graphs to silicon, breaking the fatest moat in the semiconductor industry without requiring a single line of CUDA code to be rewritten. The firm has raised over $640M from investors including Tiger Global, D1 Capital, and BlackRock Private Equity Partners. In August 2024, Groq closed a $640M Series D at a $2.8B valuation, led by BlackRock and joined by Cisco Investments and Samsung Catalyst Fund. The capital covers a planned fleet expansion beyond the 1,000+ LPUs already deployed. Groq added Stuart Pann, a supply-chain veteran who previously ran Intel's foundry and supply-chain operations, as COO in 2023; his defection signals the operational complexity ahead for an inference-cloud operator buying wafers from GlobalFoundries instead of TSMC. Yann LeCun's team provides the most visible public validation point, running Meta's own flagship open-source models on Groq hardware rather than waiting on internal GPU allocations. The structural differentiator is a compiler-first hardware architecture that targets a single function — transformer inference — and is willing to sacrifice general-purpose programmability to reach it. No other inference provider, including NVIDIA's next-generation Blackwell, is publicly hitting Groq's 300+ t/s throughput on dense 70B-parameter models. This comes with an engineering cost: the LPU's SRAM-heavy design constrains parameter size per chip, requiring massive multi-chip orchestration for frontier-class models. The company frames that orchestration — real-time partitioning across racks — as the core barrier to entry, making its fleet a functional infrastructure daemon rather than a plug-compatible GPU accelerator.
General information
Firm type
Asset Manager
Year founded
2016
AUM
Undisclosed
Location
Region
North America
Country
United States
City
Mountain View
Corporate office
Mountain View, CA, United States
Principals
Jonathan Ross
CEO and Founder
Stuart Pann
COO
Igor Genise
Chief Engineering Officer
Dennis Abts
Chief Architect
Sector focus
Frequently asked questions
Who runs engineering and investment decisions at Groq?
Groq is a venture-backed semiconductor firm, not a fund; it has no investment committee. Jonathan Ross is the founder and CEO with a technical background as the original architect of Google's TPU. Product, chip architecture, and cloud engineering decisions run through his office and Chief Architect Dennis Abts. Operational scale is managed by COO Stuart Pann, who was recruited from Intel's supply-chain leadership.
Does Groq manufacture its own chips?
No. Groq is a fabless chip designer that contracts fabrication to GlobalFoundries, not TSMC — a meaningful differentiator in the current geopolitically constrained wafer environment. The firm writes the LPU architecture, compiler, and software stack internally and deploys chips in its own managed cloud. It sells neither raw silicon nor chip-level IP to third parties.
What makes Groq's chip different from an Nvidia GPU for AI?
Groq's Language Processing Unit is a deterministic, single-core architecture where the compiler decides scheduling ahead of time — no cache misses, no speculative execution, no variable latency. This is structurally different from Nvidia's CUDA GPU model, which uses massive parallelism and runtime scheduling. The result: higher and more predictable throughput on transformer inference, particularly on token-generation latency per user.
Does Groq train models or compete with OpenAI and Anthropic?
No. Groq does not build, train, or own foundation models. It provides the cloud infrastructure that runs open-source models like Llama and Mixtral. Its customers include the model builders themselves, enterprises deploying open-weight models, and developer platforms that need fast, cost-efficient inference. This places it in the picks-and-shovels layer, not the model layer.
How does Groq compare to other inference startups like Cerebras and Sambanova?
All three attack GPU dominance with large-wafer or deterministic architectures, but their tradeoffs differ. Groq's LPU uses on-chip SRAM and a deterministic compiler optimized for transformer inference throughput, delivering best-in-class tokens-per-second on its publicly available GroqCloud. Cerebras makes the largest wafers on earth for training and inference. Sambanova's multi-core dataflow architecture targets high-batch throughput. Groq's developer-facing API makes it the most cloud-native of the three.
What is Groq's relationship with Meta and Yann LeCun?
Meta's FAIR group, led by Yann LeCun, uses Groq's LPU cloud to serve Llama models publicly. LeCun has posted results showcasing 800+ tokens per second on smaller Llama variants. This is not a paid endorsement but a functional deployment by the world's largest open-source model team, giving Groq a public proof point that no media lab or demo-only benchmark can replicate.
Where does Groq get its wafers and what is the supply-chain risk?
Groq contracts with GlobalFoundries, the US-based semiconductor fabricator, rather than the overwhelmingly dominant TSMC. This choices hedges the Taiwan-strait concentration risk but constrains node size — GlobalFoundries does not compete at TSMC's 3nm and below. Hiring former Intel supply-chain COO Stuart Pann signals the company is treating wafer procurement as a C-level operational risk.
Profile maintained by Altss using OSINT (open-source intelligence), regulatory filings, licensed data partners, and verified direct submissions. Read the methodology. Last updated: . Continuous refresh with full update cycles at least every 30 days.
Need institutional-grade insight on family offices?
Altss delivers:
Prefer a guided tour?
We’ll walk you through: