AI Citation Concentration: Why Market...

AI citation pools follow a power law. In AuthorityTech's Machine Relations Index measurement across 7,124 domains and 28,870 source events, a small number of market databases—Crunchbase, G2, Gartner—capture hundreds of citations per 30-day window while the vast majority of tracked domains receive zero. This is not an anomaly. It is the structural default of how AI engines select sources.

The concentration evidence #

The MRI tracks citation behavior across six major AI engines: Google AI Mode, Google AI Overviews, Perplexity, Claude, Gemini, and ChatGPT. When we isolate the "market database" source role—platforms whose primary function is structured company, product, or review data—the concentration is striking.

Crunchbase.com received 210 total citations in the most recent 30-day measurement window. G2.com received 204. Both were cited by all six engines, across 10 distinct verticals, on 38-39 unique queries. Their MRI consensus scores place them in the Elite tier with A-confidence ratings—the top fraction of a percent among all 7,124 tracked domains.

For comparison, the median domain in the MRI dataset receives fewer than two citations per 30-day window. The top two market databases alone account for more citation events than the bottom several thousand domains combined.

Per-engine citation distribution #

The concentration pattern holds across engines, but the distribution shifts reveal how each engine weights market database authority differently.

Engine	Crunchbase citations	G2 citations	Concentration pattern
Google AI Mode	88	62	Heaviest market-database reliance; 42% of Crunchbase's total citations come from this single engine
Claude	51	25	Strong structured-data preference; cites Crunchbase more than any non-Google engine
Perplexity	34	49	G2 outperforms Crunchbase here—Perplexity favors review-dense surfaces over funding data
Gemini	25	53	Strongest G2 preference of any engine; review aggregation appears to carry extra weight
Google AI Overviews	8	5	Lowest market-database reliance; Overviews draw more from editorial and news sources
ChatGPT	4	10	Lowest total citation volume for market databases; training-data recency may suppress newer entries

Google AI Mode accounts for 42% of Crunchbase's total AI citations—a concentration-within-concentration that suggests Google's newest AI surface has a structural preference for structured market data that its older AI Overviews product does not share.

Why market databases concentrate citations #

The concentration is not random. Market databases share structural properties that make them preferentially selected by retrieval-augmented generation systems:

Structured entity resolution. Crunchbase and G2 organize information around named entities—companies, products, people—with consistent schemas. When an AI engine needs to answer "What does [Company X] do?" or "Compare [Product A] vs [Product B]," a structured database with entity-level pages provides a more extractable answer than an unstructured blog post.

Cross-vertical coverage. Both Crunchbase and G2 span 10+ verticals in our measurement. A domain that appears in cybersecurity, fintech, HR tech, and healthtech queries simultaneously builds retrieval-system trust across a broader query surface than a vertical-specific source. The MRI's vertical spread component directly measures this: Crunchbase scores 15/15, the maximum.

Temporal consistency. Crunchbase has been cited on 23 of the last 30 measured days. G2 on 24. This daily presence signals to retrieval systems that the source is reliably available and current—not a one-time result that happened to rank. The MRI's temporal consistency component captures this: both score above 8/10.

Review and data density. G2's advantage over Crunchbase on Perplexity and Gemini likely traces to review volume. Platforms with thousands of structured user reviews per product create a citation surface that is both authoritative and difficult to replicate—exactly the kind of content AI engines prefer when answering comparative queries.

External corroboration: the power law is structural #

Independent research confirms that AI citation concentration follows power-law dynamics across domains, not just within market databases.

The Scientific Institute for Generative Intelligence (SIGI) published SIGI-2026-089, an observational study across four service verticals and four AI platforms. Their findings: the top entity in each vertical captures 60-90% of citation opportunities. The top three entities collectively capture 75-96%. One verified-review directory platform accounted for 66-84.5% of all agency citations across all four platforms tested.

Everything-PR's Citation Share Index, measuring citation frequency across ChatGPT, Claude, Gemini, Perplexity, and Google AI Overviews, found that the top 15 brands capture 64% of total category citation share. Their research also confirmed that revenue rank does not equal citation rank—the largest brand is frequently not the most cited—and that first-mover citation authority is "archival and sticky across model updates."

These findings align with what the MRI measures at the domain level: citation pools are not distributed evenly. They concentrate around sources with structural advantages, and the gap between cited and uncited widens over time as retrieval systems reinforce their own source preferences.

What concentration means for enterprise visibility #

The concentration effect has direct implications for brands trying to build AI search visibility:

Presence on concentrated platforms matters more than owned-site optimization alone. If Crunchbase and G2 collectively receive 414 AI citations per month across 10 verticals, a brand's profile quality on those platforms directly affects whether it appears in AI-generated answers. A well-structured Crunchbase profile or comprehensive G2 listing is not just a sales tool—it is a citation surface that AI engines actively retrieve.

The long tail is real. Most of the 7,124 domains in the MRI dataset receive negligible AI citations. For brands operating in that long tail, the path to citation is not more content volume—it is structural authority: consistent entity data, cross-vertical presence, and temporal reliability.

Engine-specific concentration creates engine-specific strategies. Google AI Mode's heavy reliance on market databases (42% of Crunchbase citations come from this single engine) means that brands optimizing for Google's AI surface should prioritize structured data presence. Perplexity's preference for G2 over Crunchbase suggests that review density matters more than funding data for that engine's retrieval logic.

MRI source authority comparison: top market databases #

Domain	MRI consensus	Tier	Citations (30d)	Engines	Verticals	Avg position	Temporal consistency
Crunchbase.com	80.6	Elite	210	6/6	10	5.2	8.2/10
G2.com	80.3	Elite	204	6/6	10	7.2	8.6/10
Gartner.com	—	Elite	—	6/6	—	—	—
Deloitte.com	78.9	Elite	112	6/6	9	8.3	8.6/10

Source: Machine Relations Index, 30-day measurement window ending June 2026. 7,124 domains tracked, 28,870 source events.

The gap between the top two market databases (210 and 204 citations) and the next tier (112 for Deloitte, an analyst/consulting source rather than a pure market database) shows that even within Elite-tier sources, concentration persists. The market database source role occupies the top positions not because of brand recognition but because of structural citation advantages that compound over measurement cycles.

The Machine Relations framework #

Citation concentration is a core concept in Machine Relations—the discipline of managing how AI systems discover, evaluate, and represent organizations. The MRI exists specifically to measure these dynamics: which sources AI engines actually cite, how citation distributes across the domain population, and what structural properties separate the cited from the uncited.

The concentration finding reinforces a central Machine Relations principle: AI visibility is not a content volume problem. It is a source architecture problem. The domains that concentrate citation share do so because their information architecture—structured entities, cross-vertical schema, temporal reliability, review density—matches what retrieval systems are optimized to select.

For practitioners, this means the question is not "how do we create more content?" but "how do we build the structural properties that retrieval systems preferentially select?" The answer begins with understanding where citation already concentrates—and why.

FAQ #

Why do market databases get more AI citations than news sites or blogs? #

Market databases organize information around structured entities with consistent schemas. When an AI engine needs company data, product comparisons, or funding information, a structured database provides a more extractable and verifiable answer than narrative content. The MRI measures this through the engine breadth component—market databases score 40/40 (all six engines cite them), while most editorial sources appear on fewer engines.

Is citation concentration getting worse or better over time? #

The MRI's temporal consistency scores suggest concentration is self-reinforcing. Sources cited consistently (23-24 of 30 measured days for the top market databases) build retrieval-system trust that makes them more likely to be cited in future queries. SIGI's research confirms that "entry barriers created by accumulated review history, content archives, and training-data presence" favor established incumbents.

Does being on Crunchbase or G2 guarantee AI citations for my brand? #

No. The platform itself is cited as a source—your brand appears in the AI answer only if the platform's page about your company is well-structured, current, and relevant to the query. A stale Crunchbase profile with minimal data is unlikely to be the specific page an AI engine retrieves. The citation goes to the platform; whether your brand appears depends on your entity data quality on that platform.

How does Google AI Mode differ from Google AI Overviews in source selection? #

Google AI Mode shows a strong structural preference for market databases—42% of Crunchbase's total AI citations come from this single engine, compared to just 4% from AI Overviews. AI Mode appears to weight structured, entity-level data sources more heavily, while AI Overviews draws more from editorial and news sources. This divergence means that optimizing for one Google AI surface does not automatically optimize for the other.

What is the MRI consensus score? #

The Machine Relations Index consensus score is a composite metric (0-100) measuring a domain's citation authority across six AI engines. It combines engine breadth (how many engines cite the source), query diversity (range of queries triggering citations), vertical spread (industry coverage), position quality (where the source appears in AI answers), and temporal consistency (citation reliability over time). Elite tier requires high scores across all five components. Full methodology: Machine Relations Index methodology.

AI Citation Concentration: Why Market Databases Capture Disproportionate Share Across All Six Engines