How AI engines actually decide which brands to cite

A look inside the citation logic of ChatGPT, Gemini, and Perplexity, and what brands can do to influence it.

By Gaurav HenryJul 3, 202614 min read

TL;DR. AI engines do not cite the “best” answer. They cite the claim that is easiest to attribute: stated explicitly, in third-party sources they trust, in language that matches the prompt, on a page they can crawl and extract. Every engine runs the same retrieve-rank-synthesize pipeline, but the implementations differ. ChatGPT rewards being part of the public conversation, Gemini and AI Overviews reward clean structured data, Perplexity rewards an independent third-party footprint. There is no single optimization that wins everywhere, which is why measurement has to be per-engine.

The most useful piece of news for anyone running marketing in 2026 is that AI citation logic is not a black box. It is a stack of decisions, each of which leaves visible fingerprints. If you have ever sat down with a Perplexity answer and clicked through every source it cited, you already know that the engine is doing something very specific. The pages it links to share more in common with each other than they do with the rest of the web. They look reference-shaped. They have clean structured data. They are easy to extract from, even if they are not the most popular pages on a topic.

This piece is the engineering view of why one brand gets cited ten times more than a competitor with similar SEO. It will not tell you exactly how OpenAI weights its retrieval ranker, because nobody outside OpenAI knows that. But the broad shape of how each engine answers is now well understood, and the levers a brand can actually pull are concrete and short.

How does the retrieve-rank-synthesize pipeline work?

Almost every AI answer engine in production today follows the same three-step pipeline. You have probably seen it called RAG, for retrieval-augmented generation. Different engines use very different implementations, but the steps are recognizable.

Retrieve: given a user prompt, fetch a candidate pool of documents that might be relevant. Retrieval typically combines a fast vector search across an embedded corpus, a keyword search, and (in the engines with live web access) a real-time web fetch.
Rank: score those candidates against the prompt. Reranking usually involves a smaller, faster model whose only job is to order documents by likely usefulness.
Synthesize: pass the top candidates to the main language model along with the prompt, and let it generate an answer that draws from them.

The synthesis step is where citation decisions actually happen. The model is looking at, say, the top eight documents in the reranked pool, and it is being asked to produce an answer that quotes some of them by name. Which ones get named is the question every brand should care about. The answer is not “the most relevant ones.” The answer is closer to “the ones whose claims are easiest to attribute.” (For the marketing-side framing of this same pipeline, see What is Generative Engine Optimization?.)

Why does “easy to attribute” beat “best answer”?

Consider the model’s job at synthesis time. It has been given a prompt and a stack of source documents. It is going to produce an answer. If it makes a specific claim (“Pipedrive is the most affordable CRM under $50 per seat”), it needs to either back that claim up with a citation or risk being wrong. The model is biased toward claims that are easy to source, because the engine that built it has trained it (often through RLHF) to prefer answers that are confidently attributable.

What does easy to source look like? It looks like a page where the claim is stated explicitly, in a place a model can find it, in language the model can match against the user’s question. A comparison table on a third-party review site that says “Pipedrive starting price: $14 per user per month” is trivially attributable. A blog post that contains the same information buried in a paragraph two thousand words deep is harder. A landing page that gestures at “affordable plans” without ever giving a number is impossible to use as a citation.

The brand whose claims are easy to attribute, in third-party sources the model trusts, in language that matches buyer prompts, gets cited. The brand that buries its claims in narrative-shaped marketing copy gets summarized away.

What does each major AI engine actually do?

The pipeline above is shared. The implementations differ in important ways. Here is what Zaraftis has observed across roughly six months of careful testing on its prompt panel. These are behavioral observations of closed systems, not vendor-confirmed internals, so treat them as a working model rather than documentation.

ChatGPT (search and browsing)

ChatGPT in search mode does a Bing-style web retrieval, augmented by what the model already “knows” from training. The retrieval is fast and broad, and the engine is willing to synthesize without citing if it is confident in an answer from training. This is why some brands disappear from ChatGPT mentions when their visibility on the open web is fine: the model already has a stable opinion about the category from training data, and it is not consulting the web at all for that prompt.

Practically, ChatGPT cares about two surfaces: the live indexable web (Bing-flavored), and the long-tail public discourse that ended up in training data. The brands that win on ChatGPT are the brands that show up consistently across third-party industry mentions, podcasts that get transcribed, GitHub issues, Reddit threads with substantive content, and well-maintained Wikipedia entries. ChatGPT is the engine that most rewards being part of the cultural conversation, not just the SEO conversation.

Gemini and Google AI Overviews

Gemini’s retrieval is, unsurprisingly, very Google-flavored. The set of pages it considers as candidates correlates strongly with traditional Google ranking. But the synthesis step weights two things much more aggressively than classic Google: structured data, especially schema.org markup, and the existence of a clear, extractable answer in the page content.

Pages that Gemini and AI Overviews cite tend to have at least one of: a FAQPage schema with the actual question, a Product or SoftwareApplication schema with attributes that match the prompt, an HowTo schema, or a clean comparison table with semantic markup. If you have a page that ranks well in classic Google but does not get cited in AI Overviews, the explanation is almost always missing or sloppy structured data, or content that is buried under JavaScript and never gets extracted cleanly.

Perplexity

Perplexity is the engine where independent third-party sources matter most. Perplexity will reach further down the result set than Google does, and it will preferentially cite sources that are not from the brand itself. This is partly a design choice (the product is built around the conceit of being a research tool) and partly a retrieval one (Perplexity is more aggressive about deduplicating self-promotional sources).

The brand that wins on Perplexity is the brand with the strongest reputation in the third-party reference web. Review sites, software directories, comparison articles by independent bloggers, structured listicles in publications, all of these matter more in Perplexity than in any other engine. If your only “citation surface” is your own site, Perplexity will cite a competitor whose third-party footprint is bigger.

Copilot, Grok, AI Mode

Microsoft Copilot is essentially Bing’s index plus an answer layer, with quirks that resemble ChatGPT (since they share infrastructure). Grok favors recency more aggressively than the others and pulls heavily from X conversations, which makes social presence on a single platform unusually load-bearing. Google’s “AI Mode” is closer to a more aggressive Gemini, with longer answer formats and a willingness to skip the SERP entirely.

The cross-engine takeaway: there is no single optimization that wins everywhere. The brand that dominates ChatGPT may be invisible in Perplexity. This is why measurement has to be per-engine, and why a single-number “AI visibility” metric is misleading. (For why this also kills the single “rank” number, see The end of keyword rankings.)

Counterintuitive observation

The strongest predictor of cross-engine citation share, in Zaraftis data, is not domain authority. It is the consistency of how a brand is described across the open web. Brands with one canonical description in their schema, their about page, and their press coverage outperform brands whose own marketing copy disagrees with itself.

Which structured-data signals actually move the needle?

Most schema documentation reads like a wall of “yes, you should add this.” It is easier to start from the schema types that AI engines actually pull from, in Zaraftis testing.

Organization with consistent name, alternate names, and a clean sameAs array pointing to LinkedIn, Crunchbase, and Wikipedia. This is the most important schema for the entity itself.
Product or SoftwareApplication with explicit offers, pricing, and aggregateRating when available. AI engines pull pricing from this constantly.
FAQPage for any page that answers buyer questions, but only when the questions are real questions buyers ask. AI engines penalize FAQ schema that looks generated.
Article with a clean author and publisher. This is mostly relevant for content marketing pages that you want cited as references.
BreadcrumbList, because it helps the engine understand the page hierarchy at a glance.

The unsexy truth is that almost nobody implements these correctly. Either the schema exists but disagrees with the visible page content (a common error after a redesign), or it is on the page but emits validation warnings, or it is structured according to old guidance that the engines have moved past. Going through your structured data with a careful eye and a validator is, dollar for dollar, one of the highest-return things a marketing team can do this quarter.

The signals that matter that nobody talks about

Beyond schema and content, three less-discussed signals are doing real work in citation decisions.

Entity consistency. Does the model know that “Acme”, “Acme Inc”, “Acme Software”, and “Acme Cloud Platform” are all the same thing? When the entity is fragmented, the model treats each variant as a weaker, smaller signal. A unified brand description, used consistently across your site, your social profiles, and your press, compounds.

Crawler access for the AI fleet. If you are blocking GPTBot, Google-Extended, ClaudeBot, or PerplexityBot in your robots.txt, you are telling those engines not to read you. Some teams blocked these without realizing what they were blocking. Zaraftis sees this on roughly 18% of B2B sites it audits. The fix is one line of robots.txt, and it can lift visibility within weeks.

Server-side rendering. AI crawlers are getting better at executing JavaScript, but most still prefer rendered HTML. A site whose key content is loaded by client-side JS is showing the engine a near-empty page. Static or server-rendered HTML for at least your high-intent pages is one of the cheapest wins.

The objection: “Surely the model just picks the best answer”

This is the assumption worth pushing back on hardest, because it is what most marketers seem to believe and it is wrong in a way that costs them money.

Models do not pick the best answer in any abstract sense. They pick the answer they can most easily produce given the documents in front of them. If your content is the best in the world but it is locked behind a JavaScript app the crawler cannot render, it does not exist for the model. If your competitor’s content is mediocre but it is on a clean static page with a comparison table whose column headers match the prompt vocabulary, your competitor wins.

The implication is uncomfortable for content teams: writing brilliantly is not enough. The work has to be findable, extractable, and citable. The engine is not reading your best paragraph. It is grepping a scratch space of retrieved chunks for a sentence that answers the question it was asked. Make sure that sentence is in there, and make sure the page around it does not get filtered out.

Where should a brand starting today focus first?

If a B2B brand walked in tomorrow and asked Zaraftis where to focus to lift citation share across the major engines in ninety days, the order of work has been remarkably stable across the customer base.

Audit and fix crawler access for all major AI bots.
Validate and tighten Organization, Product, and Article schema.
Server-render your high-intent commercial pages.
Build or rebuild one canonical comparison page per major competitor, with a real table.
Get your brand name and description consistent across LinkedIn, Crunchbase, your own about page, and any third-party directory you can find.
Earn three to five third-party reference mentions per quarter, in formats Perplexity will pick up.

None of those steps are revolutionary. They are not even particularly creative. But they are exactly the steps that the brands at the top of the Zaraftis citation share leaderboards have done, and the steps that the brands at the bottom have either skipped or done badly. AI engines are systems with rules. The rules favor the brands that respect how the systems work. That is the entire game. (The step-by-step version is The AI-readability checklist; the number this work moves is AI visibility.)

Frequently asked questions about AI citation logic

Q: Do AI engines cite the best answer?

A: No. They cite the claim that is easiest to attribute given the documents retrieved: stated explicitly, in a source the model trusts, in language that matches the prompt, on a page the crawler can read and extract. Brilliant content locked behind client-side JavaScript or buried in narrative copy loses to mediocre content on a clean, extractable page.

Q: Why does my page rank on Google but not get cited in AI Overviews?

A: Almost always missing or sloppy structured data, or content that is buried under JavaScript and never gets extracted cleanly. Gemini and AI Overviews start from a Google-flavored candidate set but weight schema.org markup and a clear, extractable answer far more than classic ranking does. Validate your Organization, Product/SoftwareApplication, and FAQPage schema and make sure it agrees with the visible page.

Q: Which AI engine is hardest for a brand to show up in?

A: It depends on your footprint. Perplexity is hardest for brands whose only citation surface is their own domain, because it preferentially cites independent third-party sources. ChatGPT is hardest for brands that are not part of the broader public conversation (industry mentions, transcribed podcasts, Reddit, Wikipedia), because it will answer from training without consulting the web. The brand that dominates one engine can be invisible in another.

Q: Why is per-engine measurement necessary?

A: Because the engines weight different signals. ChatGPT rewards public-conversation presence, Gemini and AI Overviews reward structured data, Perplexity rewards an independent third-party footprint, Grok favors recency and X activity. A single blended “AI visibility” number averages away exactly the per-engine gaps you need to act on.

Q: What is the single highest-return signal to fix?

A: Crawler access. If your robots.txt blocks GPTBot, Google-Extended, ClaudeBot, or PerplexityBot, you are invisible to those engines no matter how good everything else is. Zaraftis finds this on roughly 18% of audited B2B sites. After that, structured-data validation and server-rendering high-intent pages are the next two.

Q: Does domain authority predict AI citations?

A: Less than you would expect. In Zaraftis data, the strongest predictor of cross-engine citation share is not domain authority but entity consistency: whether your schema, about page, and press coverage all describe the brand the same way. A fragmented entity gets treated as several weak signals instead of one strong one.

How we know

The figures and engine behaviors in this article come from the Zaraftis platform: a prompt-tracking system that runs buyer-intent prompts against ChatGPT, Gemini, Perplexity, Claude, Google AI Overviews, AI Mode, Copilot, and Grok on a weekly cadence, plus a 50-point AI-readability audit on each tracked site. The per-engine behavior descriptions (retrieval style, what each engine weights) are inferences from observing inputs and outputs on the prompt panel over the six months ending May 2026; they are not vendor-confirmed internals, and they shift as the engines update. The 18% mis-configured-robots.txt figure and the entity-consistency finding come from the audit set of roughly 1,200 B2B brands tracked in that window.

Find out which signals are costing you citations.

Zaraftis runs a 50-point GEO audit that maps your site against the citation logic of every major AI engine. You get a prioritized fix list, not a 90-page PDF.

Start free trial →

The AI-readability checklist: 12 things to fix this week

Twelve specific fixes that meaningfully improve how AI crawlers read your site. Most are boring. All matter.

Perspective · 11 min read

AI visibility is the new market share

A new KPI for the era when buyers ask AI before they ask Google. How to measure it, and what good actually looks like.