Why AI Search Won't Cite Your Website

A University of Toronto study released in September 2025 found that AI search engines show a "systematic and overwhelming bias" toward earned media (third-party, authoritative sources) and against brand-owned content. That is the inverse of how Google has worked for two decades.

The paper examined large-scale citation behavior across ChatGPT, Perplexity, and Gemini and found that the earned media preference is a structural property of how these systems retrieve and synthesize information, not a function of query type or industry (Chen et al., arXiv:2509.08919, University of Toronto, September 2025). The contrast with Google is explicit in the findings: where Google returns a "more balanced mix" of owned, social, and earned content, AI search filters heavily toward external validation.

This matters because 94% of B2B buyers now use AI tools in their purchasing process, and twice as many name generative AI or conversational search as a "more meaningful or important source" than any other source, outranking vendor websites, product experts, and sales (Forrester Buyers' Journey Survey, 2026). The audience that brands have been trying to reach is increasingly arriving at AI systems first. What those systems say about a brand draws from the earned layer, not the brand's own site.

Most of what AI search reads, it doesn't cite #

Even when brand-owned or brand-adjacent content is indexed and retrievable, it frequently doesn't make it into the answer.

A June 2025 study published on arXiv analyzed approximately 14,000 real-world conversations with search-enabled LLMs, drawing from LMArena logs (Strauss et al., arXiv:2508.00838, 2025). The data documents an attribution gap that is consistent across platforms:

Google Gemini provides no clickable citation source in 92% of its answers
34% of Gemini responses are generated without fetching any online content at all
Perplexity visits approximately 10 relevant pages per query but cites only 3 to 4
Citation efficiency (extra citations per additional relevant page visited) varies from 0.19 to 0.45 across models on identical queries

That 0.19-to-0.45 range is the key number. On the same query, some models cite one source for every five pages they read; others cite nearly half. The difference comes from retrieval architecture and design choices, not the quality of available sources. The researchers conclude that "retrieval design, not technical limits, shapes ecosystem impact."

A brand can be present in a platform's source index and still never appear in an answer. Presence is necessary but not sufficient. The question is what gets a source promoted from "consumed" to "cited."

What drives selection #

A December 2025 study from The Hong Kong University of Science and Technology examined 55,936 queries across six LLM-based search engines and two traditional search engines to identify the features that predict citation selection (Zhang et al., arXiv:2512.09483, HKUST and Rutgers University, December 2025). Two findings are relevant here.

First, LLM search engines cite more diverse domains than traditional search. 37% of the domains cited by LLM-based search engines don't appear in traditional search results at all. Smaller, specialist publications have real surface area in AI search that they lack in Google rankings.

Second, LLM search engines still underperform traditional search on credibility and political neutrality metrics. Diversity of source does not mean quality of source selection. The broader citation graph is not a level one. Structured selection rules still govern which sources within that graph get promoted.

A separate September 2025 study examined what content properties trigger citation in generative search engines (Ma et al., arXiv:2509.14436, 2025). The finding: AI search systems prefer content with higher predictability for underlying LLMs and greater semantic similarity among selected sources. Citations emerge from "intrinsic LLM tendencies to favor content aligned with their generative expression patterns." Sources that sound like sources AI systems have learned to cite get cited.

The compounding problem #

There is a structural reason why established, heavily-cited sources continue to accumulate citations in AI environments.

A 2025 study from Vrije Universiteit Brussel analyzed 274,951 references generated by GPT-4o across 10,000 academic papers and found that LLMs "systematically reinforce the Matthew effect in citations," consistently favoring already-cited sources when generating references (Algaba et al., arXiv:2504.02767, Vrije Universiteit Brussel, April 2025). Sources already in the citation graph attract more citations. Sources outside it have a harder time entering.

This pattern is not confined to academic literature. The mechanism operates in commercial AI search for the same reason: models were trained on text that already reflects existing citation hierarchies. Brand-owned content does not appear in third-party citation graphs. Earned media does. The distinction between the two determines what AI systems consider authoritative enough to surface.

The channel is shifting #

The urgency comes from where search behavior is moving.

Google desktop searches per U.S. user fell nearly 20% year over year, based on clickstream data from tens of millions of U.S. users compiled by Datos and SparkToro and published in January 2026 (Goodwin, Search Engine Land, reporting on Datos/SparkToro, January 2026). The users are not gone. They are going to AI search instead.

Forrester research published in February 2026 found that AI answer engines are now "one of the first places buyers turn for vendor insights" and that brands not appearing in those answers risk being "excluded from buyer shortlists" before any sales contact occurs (Bills, Forrester, February 2026).

A brand that spent the last decade building an SEO-optimized website without building its earned media presence is well-optimized for a channel that is losing audience to a channel that systematically prefers the content owned strategies don't produce.

What this means for Machine Relations #

The academic research covered here converges on a single structural finding: AI search does not operate on the same preference function as Google. It privileges external validation over owned signals. That is documented in large-scale behavioral studies, not inferred from anecdote.

Machine Relations is the discipline of managing how machines (LLMs, AI search systems, algorithmic agents) perceive and reference a brand. The research above shows why that discipline requires an earned media strategy, not a content marketing one. Getting cited in AI search requires other sources, with authority, saying your name first.

That is what earned media does. That is why it works in AI search when brand-owned content does not.

More at https://machinerelations.ai

Frequently Asked Questions #

What is most of what AI search reads, it doesn't cite? #

Even when brand-owned or brand-adjacent content is indexed and retrievable, it frequently doesn't make it into the answer.

What drives selection? #

A December 2025 study from The Hong Kong University of Science and Technology examined 55,936 queries across six LLM-based search engines and two traditional search engines to identify the features that predict citation selection (Zhang et al. , arXiv:2512. 09483, HKUST and Rutgers University, December 2025).

What is the compounding problem? #

There is a structural reason why established, heavily-cited sources continue to accumulate citations in AI environments.

What is the channel is shifting? #

The urgency comes from where search behavior is moving.

Most of what AI search reads, it doesn't cite #

What drives selection #

The compounding problem #

The channel is shifting #

What this means for Machine Relations #

Frequently Asked Questions #

What is most of what AI search reads, it doesn't cite? #

What drives selection? #

What is the compounding problem? #

What is the channel is shifting? #

What this means for Machine Relations? #

Check how AI systems cite your brand.

Most of what AI search reads, it doesn't cite #

What drives selection #

The compounding problem #

The channel is shifting #

What this means for Machine Relations #

Frequently Asked Questions #

What is most of what AI search reads, it doesn't cite? #

What drives selection? #

What is the compounding problem? #

What is the channel is shifting? #

What this means for Machine Relations? #

Related reading #

Check how AI systems cite your brand.