How Six AI Engines Choose Sources...

Each AI engine selects its own sources. Analysis of 2 million citations shows 71% of sources cited by AI engines are exclusive to a single model. BrightEdge research on hundreds of millions of citations puts pairwise overlap between engines' top cited sources at 16–59%. A separate study of 680 million citations across three platforms found that ChatGPT, Perplexity, and Google AI Overviews each favor structurally different source types. A brand visible in one engine may be absent from the others, and the reasons are structural, not random.

How Much Do AI Engines Actually Disagree on Sources? #

The scale of disagreement is larger than most operators expect. Temso AI's analysis of 2 million citations found that even the two most overlapping models — Google AI Overviews and Grok — agree on only 1 in 5 sources for the same question. Ahrefs' study of 15,000 prompts found that only 12% of URLs cited by ChatGPT, Gemini, and Copilot rank in Google's top 10 organic results. Perplexity shows the highest correlation with traditional Google rankings at roughly 29%.

This is not noise. It reflects fundamentally different retrieval architectures, index sources, and source evaluation criteria across engines.

Machine Relations Index (MRI) data from 28,870 source events across 7,124 domains and six engines quantifies this divergence at the individual source level. The same Elite-tier domain can receive 88 citations from one engine and 4 from another within the same 30-day window.

Citation Selection Patterns by Engine #

MRI data reveals distinct source preferences when the same elite domains are measured across all six engines. The following table shows 30-day citation counts for top-tier sources, each scoring above 76 on the MRI consensus scale:

Source	Role	Perplexity	ChatGPT	Gemini	Claude	Google AI Mode	Google AI Overviews	Total
Crunchbase	Market database	34	4	25	51	88	8	210
G2	Market database	49	10	53	25	62	5	204
Gartner	Analyst research	0	25	63	25	98	12	223
Deloitte	Analyst research	32	5	17	18	37	3	112
Fortune Business Insights	Market database	26	3	16	11	38	4	98
Grand View Research	Market database	33	18	22	5	34	4	116

Source: Machine Relations Index, 30-day measurement window ending June 2026. MRI methodology version 1.1, six-engine measurement.

Three patterns stand out:

Google AI Mode dominates citation volume. Across these six elite sources, Google AI Mode produces more citations than any other engine — 357 total versus Perplexity's 174 and ChatGPT's 65. This reflects Google AI Mode's integration with Google's full search index and its tendency to cite structured data sources when answering enterprise and technology queries.

Perplexity skips Gartner entirely. Despite Gartner receiving 223 total citations and ranking as an Elite MRI source (consensus 77.2), Perplexity produced zero Gartner citations in the measurement window. Perplexity's own crawler and real-time retrieval architecture appears to deprioritize paywalled analyst content that other engines surface through cached or syndicated references.

Claude favors Crunchbase disproportionately. Claude cited Crunchbase 51 times — more than any other engine except Google AI Mode — while citing Grand View Research only 5 times. Claude's source selection appears to weight structured company and funding databases more heavily than market sizing reports.

Source Role Shapes Citation Patterns More Than Domain Authority #

The MRI data reveals a pattern that domain-level analysis misses: engines select sources partly based on the role the source plays in the information ecosystem.

Market databases (Crunchbase, G2, Fortune Business Insights, Grand View Research) receive broadly distributed citations across engines because they provide structured, factual data that retrieval systems can extract cleanly. Analyst research firms (Gartner, Deloitte) show higher variance — some engines cite them heavily while others avoid them, likely reflecting differences in how engines handle paywalled or gated content.

A May 2026 study of 5,000 queries across six platforms found similar role-based patterns at the content level, with engines preferring sources that package evidence in formats matching their extraction architecture. An arxiv framework paper distinguishes between citation selection (choosing which sources to retrieve) and citation absorption (how much of the cited content enters the generated answer), noting that these are separate processes with different optimization levers.

An industry analysis of citation rates by vertical found that source selection patterns vary not just by engine but by industry — healthcare and finance queries trigger more citations from institutional sources than consumer tech queries. Government sources illustrate this role effect: they appear in 6% of Google AI Overview citations versus 2% of standard search results, suggesting AI summaries actively elevate institutional authority beyond what traditional ranking signals would predict.

What Drives the Divergence #

Five structural factors explain why engines disagree on sources:

1. Index source. ChatGPT uses Bing's web index. Google AI Mode and AI Overviews use Google's index. Perplexity runs its own crawler. Each starts from a different document universe before any ranking begins.

2. Retrieval architecture. Perplexity performs real-time web searches per query. ChatGPT uses a hybrid of pre-indexed knowledge and search. Claude operates primarily from training data with selective web retrieval. These architectures produce structurally different source pools. A complete guide to LLM source selection details how each retrieval pipeline applies different weighting to freshness, authority, and content structure.

3. Paywall handling. Gartner's zero citations from Perplexity versus 98 from Google AI Mode likely reflects different policies on indexing and surfacing content behind registration or payment walls.

4. Recency weighting. Some engines prioritize recent publications; others weight established sources with long track records. The MRI temporal consistency component measures this directly — sources cited consistently over 28 days score higher than those appearing in bursts.

5. Content format fit. Engines extract differently. Structured data (tables, lists, named entities with clear relationships) gets cited more consistently across engines than narrative analysis, which tends to appear in engines with deeper comprehension architectures.

What This Means for Machine Relations #

The source selection data confirms the core Machine Relations thesis: a brand's relationship with AI engines is not a single score but a portfolio of engine-specific positions.

Three operational implications follow from the data:

Single-engine optimization is a structural mistake. A brand visible in Google AI Mode but absent from Perplexity is missing a distinct audience. The MRI measures this as engine breadth — the number of engines citing a source — and weights it as the largest single component of the consensus score (up to 40 points).

Source role is an optimization lever. Brands that publish structured, extractable data (funding databases, comparison tables, methodology documentation) earn more consistent cross-engine citations than those publishing narrative analysis alone. The data shows market databases earning citations from all six engines while analyst reports show engine-specific clustering.

Measurement must be cross-engine by default. Tracking citations in one engine creates a false picture of AI visibility. The 16–59% pairwise overlap means more than half of any single engine's citation set is invisible from another engine's perspective. Norg AI's comparison of answer engine architectures documents the mechanical differences that produce this divergence.

FAQ #

Do all AI engines use the same sources to answer the same question? #

No. Research on 2 million citations found 71% of cited sources appear in only one engine's response. Even the most overlapping pair agrees on roughly 20% of sources for identical queries.

Which AI engine cites the most sources? #

Google AI Mode produces the highest citation volume among the six engines measured by MRI. Across elite-tier market databases and analyst sources, Google AI Mode generated 357 citations in a 30-day window — more than double Perplexity (174) and over five times ChatGPT (65).

Why does Perplexity skip some major sources like Gartner? #

Perplexity operates its own web crawler and performs real-time retrieval rather than relying on a pre-built search index. Sources behind paywalls or registration walls — common for analyst firms — appear less frequently because Perplexity's crawler may not access or index the underlying content that other engines surface through cached or syndicated copies.

How should brands measure their AI search visibility across engines? #

The Machine Relations Index measures citation authority across six engines simultaneously. The methodology scores engine breadth (how many engines cite you), query diversity (range of queries triggering citations), vertical spread, position quality, and temporal consistency. Single-engine tracking produces a structurally incomplete picture.

Last updated: June 14, 2026. Source data: Machine Relations Index v1.1, six-engine measurement across 7,124 domains and 28,870 source events.

How Six AI Engines Choose Sources: Citation Selection Patterns Across ChatGPT, Perplexity, Gemini, Claude, and Google AI