The Hidden Bias of Generative Search Engines

The Hidden Bias of Generative Search Engines

The average domain age of sources cited by ChatGPT is 17 years. If your company is new, AI engines are playing with a stacked deck. But there's a way to change the game.

6 min read1,209 words
D
Author
DirtyToken

Founder/CEO

X profile

Concept, angle, and editorial review by DirtyToken. First draft written by the LLM Driven Writer Agent.

On this page

Do AI engines favor established domains?

AI engines like ChatGPT, Perplexity, and Claude show a systematic bias toward established domains and sources with high prior visibility. Brands present on 4 or more platforms are 2.8 times more likely to be cited by ChatGPT than single-platform brands. However, research also shows that 37% of domains cited by AI engines don't appear in traditional search results, indicating a real window exists for new entrants. This article analyzes how the bias works, why it's not deterministic, and what small and mid-sized businesses can do to compete.

Do AI engines favor established domains?

Yes. The research is clear.

An analysis of more than 5 million domains cited by LLMs revealed that the average domain age of sources cited by ChatGPT is 17 years. This means websites created before 2009 have a significant structural advantage over any company founded in the last decade.

Academic research confirms a similar pattern: LLMs reflect human citation patterns but with a more pronounced bias toward highly cited sources, a phenomenon known as the Matthew effect. Those who already have, receive more.

Brands present on 4 or more platforms are 2.8 times more likely to appear in ChatGPT responses than brands with presence on a single platform. This means having a good website isn't enough. Digital omnipresence multiplies citability.

Why do LLMs favor established sources?

The bias isn't arbitrary. It has specific technical causes.

LLMs are trained on historical internet data. Sites that have existed longer appear more frequently in that training data. When a model has seen millions of times that Wikipedia, Forbes, or HubSpot are sources cited by others, it internalizes that signal as an indicator of authority.

It's the same mechanism as backlinks in traditional SEO, but amplified. In SEO, a domain accumulates authority link by link over years. In LLM training, that accumulation is compressed into a single process that reinforces existing hierarchies.

Furthermore, when AI engines search in real time (Perplexity, ChatGPT with browsing, Google AI Overviews), the results they find are already filtered by traditional search algorithms, which in turn favor established domains. The bias is inherited and multiplied.

Is this different from the bias that already existed in Google?

Yes, and in one important aspect it's worse.

In Google, a new site with good content and an aggressive backlink strategy could climb rankings within months. The algorithm evaluated each page individually: if your article was better than Forbes' for a specific query, you could outrank it. Difficult, but possible.

In LLMs, the bias operates at the level of source identity, not just individual content. The model doesn't just evaluate whether your article is good. It evaluates whether your brand is a source it has seen before, that has been cited by others, and that appears across multiple contexts. If the answer is no, your content starts at a disadvantage even if it's technically superior.

Put differently: in Google, you competed page against page. In LLMs, you compete brand against brand.

Can new companies compete for visibility in AI engines?

Yes. And this is where the story gets interesting.

Despite the bias, research on source coverage in LLM-based search engines shows that 37% of domains cited by AI engines are exclusive to them — they don't appear in traditional search engine results.

This is significant. It means LLMs are discovering and citing sources that Google ignores. For new entrants, this opens a window that didn't exist in traditional SEO: if your content is optimized for how LLMs process information, you can be cited without needing the backlink profile that Google demands.

The key is understanding what signals LLMs use when choosing sources in real time, which are different from traditional search engine signals.

What signals do LLMs use to choose what to cite when searching in real time?

When an LLM like Perplexity or ChatGPT with browsing searches for information to answer a question, it evaluates the sources it finds based on criteria that partially overlap with Google's, but have crucial differences.

How does content structure influence citability?

LLMs need to extract concrete claims to build their responses. Content with verifiable data, cited sources, and well-delimited claims is easier to process and cite than a narrative text without clear structure. This doesn't depend on domain age. It depends on how you write.

Does topical authority matter more than domain authority?

In traditional SEO, domain authority is a global metric: a strong domain ranks well for almost any topic. In GEO, topical authority is more relevant. A small site that covers a niche with depth and coherence can be cited by an LLM as an authority source on that specific topic, even if its global domain authority is low.

Knowledge isn't democratic in LLMs, but it is meritocratic in the sense that topical depth can compensate for lack of seniority.

How does multi-platform presence affect AI citability?

Brands present across multiple platforms have a measurable advantage. If an LLM finds your brand mentioned on your website, on Medium, on GitHub, on LinkedIn, and in transcribed podcasts, it builds a more robust representation of your authority. It's not just that you appear on more sites — it's that the LLM can triangulate your identity from multiple independent sources.

This is a citability strategy accessible to new companies: you don't need 17 years of domain history. You need coherent presence across multiple platforms that reinforce your topical authority.

What can small businesses do to compete on citability?

The strategy for competing against LLM bias is based on three principles that require neither massive budgets nor years of history.

First: topical depth over breadth. Instead of publishing about many topics superficially, cover a niche with a depth that no generalist domain can match. LLMs recognize knowledge graph coherence as a signal of authority.

Second: coherent multi-platform presence. Publish your core content on your domain, but distribute complementary versions on Medium, LinkedIn, public repositories, and podcasts. Each appearance reinforces the brand identity the LLM builds about you.

Third: structure for extraction. Every piece of content should contain concrete claims, verifiable data, and well-delimited sections that an LLM can extract and cite. Citability isn't an accident — it's a design decision.

Is it unfair that AI engines favor established domains?

The bias exists and is measurable. But calling it unfair requires nuance.

LLMs aren't designed to be fair. They're designed to give useful answers. Citing established sources that have been verified by millions of users over years is, from the model's perspective, a reasonable heuristic for maximizing response reliability.

The problem isn't that the heuristic exists. The problem is that it disproportionately penalizes new voices that may have more current, more specific, or more relevant information than incumbents. A 2024 article on a specialized blog may be objectively better than a 2019 Forbes article, but the LLM will cite Forbes by default.

What makes this system different from Google is that the window for new entrants, while narrow, exists and is exploitable. The 37% of LLM-exclusive domains proves it. GEO doesn't eliminate the bias, but it offers tools to compensate for it.

The real question isn't whether the system is fair. The real question is whether you're doing anything to make the system find you.

Related articles