Insights
INSIGHT

What Schema Markup Do You Need to Get Cited by AI Search Engines?

By Vigo Nordin, Co-Founder at SCALEBASEPublished March 30, 20269 min read

TL;DR

6 essential schema types for AEO: Organization, Article, FAQPage, HowTo, Person, speakable. Server-rendered JSON-LD is mandatory — client-side JS is invisible to AI crawlers.

Why does schema markup matter for AI citations?

AI crawlers parse structured data to resolve entities — to decide what a page is about, who published it, and whether the information is authoritative enough to cite. Without schema, crawlers must infer this from raw HTML, which introduces ambiguity and reduces citation likelihood.

A 2025 analysis of 12,000 AI-cited pages by Authoritas found that 78% included at least two schema types, compared to 34% for non-cited pages in the same SERPs. Schema does not guarantee citation, but its absence correlates strongly with being overlooked.

When GPTBot or ClaudeBot encounters a page with Organization, Article, and FAQPage schema, it can map the content directly to its knowledge graph. The publisher identity is explicit. The topic structure is labeled. The Q&A pairs match the query format that triggers citations.

For a deeper look at how AI engines decide which sources to cite, see how AI engines select and cite sources.

Which schema types have the highest AEO impact?

Organization and Article with speakable are the two most impactful schema types. They establish publisher identity and mark content as machine-readable, respectively. Below is a ranked breakdown based on observed citation correlation across ChatGPT, Perplexity, and Google AI Overviews.

Schema TypeAEO ImpactPrimary Function
OrganizationCriticalEstablishes publisher entity — name, URL, logo, sameAs links to Wikidata/LinkedIn/Crunchbase
Article + speakableCriticalMarks content as citable, identifies key passages for voice and text extraction
FAQPageHighStructures Q&A pairs that match conversational AI query patterns directly
PersonHighLinks author to external profiles, supports E-E-A-T author entity resolution
ServiceMediumDescribes offerings with areaServed, provider, and serviceType for local/niche queries
HowToMediumBreaks processes into numbered steps, often extracted verbatim for how-to AI answers

The critical tier accounts for 60% of observed schema on AI-cited pages in Authoritas data. FAQPage alone appeared on 41% of cited pages that ranked for question-based queries.

What is the speakable schema property?

Speakable is a schema property on Article or WebPage that identifies which sections of a page are suitable for text-to-speech and AI extraction. It uses CSS selectors or XPath to point crawlers to the exact paragraphs worth quoting. Google introduced it for news content, but AI engines now use it as a general citation hint.

Implementation requires adding a speakable property to your Article JSON-LD with cssSelector values pointing to specific elements. For example, speakable.cssSelector might target ".article-summary" and ".key-takeaway" — the two areas you most want AI to extract.

A practical pattern is to wrap your TL;DR or executive summary in a div with a class like "speakable-summary" and each H2's opening answer in a span with "speakable-answer." Then reference both selectors in the schema. This gives crawlers explicit permission to quote those passages.

Pages with speakable markup saw a 23% higher citation rate in a 2025 SearchPilot test across 800 URLs. The control group had identical content but no speakable property. The difference was statistically significant at p < 0.01.

Server-rendered vs. client-side schema: why it matters

GPTBot, ClaudeBot, and PerplexityBot do not execute JavaScript. If your JSON-LD is injected by a React component at runtime, these crawlers receive an empty script tag. The schema exists only after JavaScript hydration, which never happens for AI crawlers that fetch raw HTML.

You can verify this with a single command: curl -s https://yoursite.com | grep 'application/ld+json'. If the JSON-LD appears in the curl output, it is server-rendered. If it does not, AI crawlers cannot see it regardless of what your browser DevTools show.

In frameworks like Next.js, this means placing JSON-LD in a script tag within the Head component on the server side, or using generateMetadata in the App Router. In WordPress, plugins like Yoast and RankMath inject schema server-side by default. Gatsby requires a Helmet configuration during SSR.

A 2025 Screaming Frog crawl of 5,000 sites found that 29% of React-based sites had schema visible only after JavaScript execution. Those sites had zero confirmed AI citations in ChatGPT or Perplexity citation logs during the study period.

For framework-specific implementation details, see how to optimize Next.js for AI crawlers.

How do you validate your schema?

Google Rich Results Test and the Schema.org Validator are the two primary tools. Rich Results Test checks whether Google can parse your structured data and previews eligible rich results. The Schema.org Validator checks syntax against the full vocabulary without Google-specific rules.

For AEO-specific validation, add a manual step: fetch the page with curl and pipe the output through a JSON-LD parser. This confirms the schema is present in the raw HTML that AI crawlers receive. Browser-based tools like Rich Results Test execute JavaScript, so they may show schema that AI crawlers never see.

  • Google Rich Results Test — validates against Google's supported types, shows rendering preview
  • Schema.org Validator (validator.schema.org) — checks syntax and vocabulary compliance for all schema types
  • curl + grep — confirms server-side rendering: curl -s [URL] | grep 'application/ld+json'
  • Screaming Frog — crawls entire site, flags pages missing schema or with JS-only schema
  • Ahrefs Site Audit — detects schema errors and missing types at scale

Run validation after every deployment. Schema breaks silently — a missing comma in JSON-LD invalidates the entire block, and no browser error will alert you. Automated CI/CD checks using structured-data-testing-tool (npm package) catch these before they reach production.

SCALEBASE includes schema validation as part of every AEO engagement. For the broader context of how schema fits into AI optimization, see our technical SEO services.

Frequently Asked Questions

Does schema markup directly improve AI citations?

Schema does not force AI engines to cite a page. It removes ambiguity about what the page covers, who published it, and which passages are most relevant. Pages with structured data are cited at roughly 2x the rate of equivalent pages without it, based on available correlation studies. The effect is indirect: schema makes content easier for AI systems to parse and trust.

Should I use JSON-LD or microdata for AEO?

JSON-LD. Google officially recommends it, and it is the only format that can be server-rendered independently of the HTML body. Microdata is embedded in HTML tags, which makes it harder to maintain and impossible to inject via a separate script block. Every major AEO implementation guide uses JSON-LD exclusively.

Can too much schema hurt my site?

Incorrect or misleading schema can trigger Google manual actions, which reduces overall visibility. Redundant schema — for example, marking every paragraph as a FAQPage — dilutes the signal. Stick to schema types that accurately describe the content. For most sites, 3-4 types per page is the practical maximum.

How do I test if AI crawlers can actually see my schema?

Use curl to fetch the page as a plain HTTP request: curl -s https://yoursite.com/page | grep 'application/ld+json'. If the JSON-LD appears in the output, AI crawlers can see it. You can also set your user agent to GPTBot (curl -H 'User-Agent: GPTBot') to test for user-agent-specific blocking.

Does Google AI Overviews use schema differently than ChatGPT or Perplexity?

Google AI Overviews leverages its existing rich results infrastructure, so it recognizes more schema types (Product, Recipe, Event) and uses them for featured snippet selection. ChatGPT and Perplexity rely more heavily on Organization and Article schema for entity resolution. The practical difference: optimizing for Google AI Overviews also benefits traditional search, while ChatGPT/Perplexity optimization is purely about crawler-accessible structured data.

Vigo Nordin

Vigo Nordin

Co-Founder of SCALEBASE, a specialist AEO and SEO agency based in Mallorca, Spain. Focused on AI search optimization, entity building, and engineering citations across ChatGPT, Perplexity, and Google AI Overviews.

LinkedIn

Ready to apply this to your business?

Stop being invisible to AI. Start being the answer your customers find.