Insights
INSIGHT

How ChatGPT Decides What to Cite. And How to Be the Answer.

By Vigo Nordin, Co-Founder at SCALEBASEPublished February 28, 2026Updated March 12, 20267 min read

The question every business should be asking

When a potential customer types "what is the best [your category] for [their problem]" into ChatGPT, your brand either appears in the answer or it does not. There is no page two. There is no position three. There is the answer, and there is everything else.

The RAG process and why it matters

ChatGPT and most modern LLMs use a process called RAG (Retrieval-Augmented Generation) for queries that require current or specific information. Instead of relying solely on training data, the model queries live web sources, retrieves relevant documents, and synthesises an answer citing the sources it used.

This means your content does not need to have been part of OpenAI's training set to be cited. It needs to be findable, crawlable, and structured clearly enough that the retrieval layer surfaces it — and the model's evaluation layer selects it.

The signals that determine citation selection

Topical authority is the first. AI models weight sources that demonstrate deep, consistent expertise. A site with 40 well-structured articles on a narrow topic will out-cite a site with one article on the same topic, even if that article is excellent. Structural clarity is the second — content organised into clear questions and answers using H2 headers and direct declarative sentences is far easier for a retrieval system to select. Entity establishment is the third — if your business has no structured data, you are an anonymous document. Anonymous documents do not get cited by name. Recency and indexation are the fourth.

The content format AI models prefer

The highest-citation content format is a direct question answered in the first sentence of a section. The correct metric to track is Share of Answers — see the AEO tools guide for how to measure it.

What you can actually control

Build deep topical authority on the questions your customers ask. Structure every page so a retrieval system can identify the question it answers. Implement Organisation and FAQ schema. Keep content current. Get external references pointing to your entity.

Key takeaways

  • ChatGPT uses RAG to retrieve and synthesise live web content for relevant queries
  • Topical authority, structural clarity, entity establishment, and recency drive citation selection
  • FAQ-structured, direct-answer content is the format AI models select most often
  • Businesses without structured data are anonymous to AI — they cannot be cited by name
  • You cannot control retrieval but you can control whether you are selected when retrieved
Vigo Nordin

Vigo Nordin

Co-Founder of SCALEBASE, a specialist AEO and SEO agency based in Mallorca, Spain. Focused on AI search optimization, entity building, and engineering citations across ChatGPT, Perplexity, and Google AI Overviews.

LinkedIn

Ready to apply this to your business?

Stop being invisible to AI. Start being the answer your customers find.