Insights
INSIGHT

How Do You Audit a Website to See If AI Search Engines Can Cite It?

By Vigo Nordin, Co-Founder at SCALEBASEPublished March 30, 202610 min read

TL;DR

AEO audit checks 6 dimensions: AI crawler access, content structure, schema markup, entity signals, platform-specific optimization, citation tracking. Typical audit: 4-6 hours. Scoring framework included.

What does an AEO audit cover that SEO doesn't?

An AEO audit evaluates whether AI search engines can access, understand, and cite your content — a fundamentally different question from whether Google can index and rank it. SEO audits check crawlability for Googlebot, keyword targeting, and link equity. AEO audits check whether GPTBot, ClaudeBot, and PerplexityBot receive usable HTML, whether schema enables entity resolution, and whether external signals corroborate your authority.

Audit DimensionSEO AuditAEO Audit
Crawler accessGooglebot + Bingbot onlyGPTBot, ClaudeBot, PerplexityBot, plus traditional crawlers
Content evaluationKeyword density, word count, thin contentAnswer structure, citation-worthiness, question-based H2s
Schema markupRich results eligibilityEntity resolution, speakable, sameAs for AI knowledge graphs
External signalsBacklink profileEntity signals: Wikidata, Crunchbase, LinkedIn, directories
Platform optimizationGoogle Search Console dataPer-platform checks: ChatGPT, Perplexity, Gemini, AI Overviews
Success metricRankings and organic trafficAI citations, brand mentions in AI responses, citation accuracy

A site can score 95/100 on a traditional SEO audit and fail an AEO audit completely — for example, a React SPA with strong backlinks but no server-rendered content. A 2025 Ahrefs study found that 44% of sites ranking in Google's top 10 for informational queries were not being cited by any AI search engine.

The 6-dimension scoring framework

This framework assigns weighted scores across six dimensions, totaling 100 points. Citability carries the highest weight because it measures the end goal directly — whether your content is structured in a way AI engines can extract and attribute. The weights reflect observed correlation with actual AI citation rates across 1,500 sites in a SCALEBASE benchmarking study.

DimensionWeightWhat It Measures
Citability25%Content structure: TL;DR presence, question-based headings, direct answers in first 50 words of each section, data density
Brand Authority20%Entity signal count and quality: Wikidata, LinkedIn, Crunchbase, press coverage, directory listings
Content Quality20%Depth, accuracy, uniqueness, E-E-A-T signals: author bios, citations to primary sources, original data
Technical15%SSR verification, page speed, canonical URLs, robots.txt for AI crawlers, llms.txt presence
Structured Data10%Schema completeness: Organization, Article, FAQPage, speakable, sameAs links, JSON-LD validation
Platform10%Per-platform optimization: Google AI Overviews snippets, ChatGPT retrieval, Perplexity indexing, Gemini grounding

Each dimension is scored 0-100 internally, then multiplied by its weight to produce a final composite score. In the benchmarking study, sites scoring above 70 composite were cited by at least one AI platform. Sites scoring above 85 were cited by three or more platforms. The median score across all audited sites was 41.

Step-by-step AEO audit process

A thorough AEO audit follows a fixed sequence: access first, then structure, then signals. Starting with access prevents wasting time analyzing content that crawlers cannot reach. Expect 4-6 hours for a 50-page site. The process below covers each dimension in order.

Phase 1: AI crawler access (30-45 minutes)

  • Check robots.txt for Disallow rules targeting GPTBot, ClaudeBot, PerplexityBot, or wildcard blocks
  • Fetch 5 key pages with curl and verify the HTML contains full content, not empty divs
  • Test with AI crawler user agents: curl -H 'User-Agent: GPTBot' — confirm 200 status and same content
  • Check for llms.txt at /llms.txt and /llms-full.txt
  • Verify XML sitemap is accessible and contains all canonical URLs

Phase 2: Content structure (60-90 minutes)

  • Review 10 key content pages for question-based H2 headings
  • Check whether each H2 section opens with a direct 40-60 word answer
  • Look for TL;DR or summary boxes at the top of long-form content
  • Count data points per section — target at least one concrete statistic, date, or measurement per H2
  • Evaluate FAQ sections: do they exist, do they match real user questions
  • Assess content length: 1,200-2,000 words for pillar content, 600-1,000 for supporting pages

Phase 3: Schema and structured data (30-45 minutes)

  • Validate JSON-LD on 5 key pages using Schema.org Validator
  • Confirm Organization schema with sameAs links on the homepage
  • Check Article schema with author, datePublished, dateModified on content pages
  • Look for FAQPage schema on pages with FAQ sections
  • Test speakable property presence and CSS selector accuracy
  • Verify all schema is server-rendered (curl test, not browser test)

Phase 4: Entity signals (45-60 minutes)

  • Search Wikidata, Wikipedia, LinkedIn, Crunchbase for the brand
  • Check 3-5 industry directories relevant to the company's vertical
  • Google the brand name excluding the company's own domain
  • Count total distinct entity signals and score completeness
  • Identify disambiguation issues if the brand name is shared with other entities

Phase 5: Platform-specific checks (30-45 minutes)

  • Query the brand name and key topics in ChatGPT, Perplexity, Gemini, and Google AI Overviews
  • Document which platforms cite the brand, which mention it without citation, and which ignore it
  • Check citation accuracy — is the information attributed correctly
  • Note competitive citations: which competitors appear for the same queries

Phase 6: Scoring and report (30 minutes)

  • Score each dimension 0-100 based on findings
  • Calculate weighted composite score
  • Document critical findings (blockers), high-priority improvements, and medium/low optimizations
  • Prioritize actions by impact-to-effort ratio

How to prioritize findings

Audit findings fall into four tiers based on their impact on AI citability. Critical findings are blockers — they prevent AI engines from accessing content entirely. Address these before anything else. High-priority findings significantly reduce citation likelihood but do not block access completely.

PriorityDefinitionExamplesTypical Fix Time
CriticalAI crawlers cannot access or parse the content at allrobots.txt blocks GPTBot, content is CSR-only, no schema present1-4 hours
HighContent is accessible but poorly structured for AI extractionNo question-based headings, missing FAQ sections, no TL;DR, weak entity signals4-8 hours
MediumOptimization gaps that reduce citation likelihoodMissing speakable property, incomplete sameAs links, no llms.txt2-4 hours
LowPolish items with marginal impactOG image optimization, social profile completeness, minor schema additions1-2 hours

In a typical audit, 15% of findings are critical, 30% are high, 35% are medium, and 20% are low. The most common critical finding across SCALEBASE audits is AI crawler blocking via robots.txt — present in 28% of all audited sites. The second most common is JavaScript-only rendering with no server-side content.

Free tools for a basic AEO audit

A basic AEO audit does not require paid tools. The following free tools cover approximately 70% of what a full audit examines. The remaining 30% — entity signal depth, competitive citation analysis, and platform-specific optimization — requires manual research or specialized tooling.

  • curl (command line) — fetches raw HTML to verify SSR, schema presence, and AI crawler access. The single most important AEO audit tool.
  • Google Rich Results Test (search.google.com/test/rich-results) — validates schema syntax and rich result eligibility. Note: executes JavaScript, so not a substitute for curl.
  • Schema.org Validator (validator.schema.org) — checks JSON-LD against the full Schema.org vocabulary without Google-specific restrictions.
  • Google Search Console — shows which pages are indexed, flags crawl errors, and lets you submit sitemaps.
  • Wikidata (wikidata.org) — search to check if your entity exists and verify property completeness.
  • ChatGPT, Perplexity, Gemini — manually query your brand name and key topics to see current citation status.
  • Screaming Frog (free up to 500 URLs) — crawls your site and flags missing schema, broken pages, and rendering issues.
  • PageSpeed Insights (pagespeed.web.dev) — measures server response time, which affects whether AI crawlers timeout.

For a full AEO audit with scoring, competitive analysis, and prioritized action plan, see SCALEBASE's AEO services. For the specific schema and entity checks referenced above, see schema markup for AEO, entity signals for AI search, and what is AEO.

Frequently Asked Questions

How much does a professional AEO audit cost?

Pricing varies by site size and scope. A basic AEO audit for a 50-page site typically takes 4-6 hours of specialist time. Agency rates for AEO-specific audits range from $1,500 to $5,000, depending on depth — whether entity signal building, competitive analysis, and platform-specific testing are included. In-house teams can perform a basic audit using free tools in the same timeframe.

Can I do an AEO audit myself?

Yes, for the technical and content dimensions. The step-by-step process above is designed to be followed by anyone with basic command-line familiarity. The entity signal and platform-specific dimensions benefit from experience — knowing which signals matter most and how to interpret AI engine behavior patterns. Start with the technical checks (curl tests, robots.txt, schema validation) as these are the most objective.

How often should I repeat an AEO audit?

Quarterly for most sites. AI search engines update their crawling behavior and citation algorithms frequently — Perplexity updates weekly, ChatGPT's retrieval system updates monthly. A quarterly cadence catches regressions (accidental robots.txt changes, schema breakage after redesigns) and identifies new optimization opportunities. Sites undergoing active development should audit monthly.

What is a good AEO audit score?

Using the 6-dimension framework: above 70 is good (cited by at least one AI platform), above 85 is strong (cited by three or more platforms). The median across all audited sites is 41. Most sites score well on technical SEO basics but poorly on entity signals and content structure — the two areas most specific to AEO. A score below 50 indicates fundamental gaps that need immediate attention.

What is the most common critical finding in AEO audits?

AI crawler blocking via robots.txt, found in 28% of audited sites. Many sites added blanket Disallow rules for unknown bots as a security measure, unintentionally blocking GPTBot and ClaudeBot. The second most common critical finding (23% of sites) is JavaScript-only rendering where content pages return empty HTML to non-rendering crawlers. Both are fixable in under an hour once identified.

Vigo Nordin

Vigo Nordin

Co-Founder of SCALEBASE, a specialist AEO and SEO agency based in Mallorca, Spain. Focused on AI search optimization, entity building, and engineering citations across ChatGPT, Perplexity, and Google AI Overviews.

LinkedIn

Ready to apply this to your business?

Stop being invisible to AI. Start being the answer your customers find.