How AI Engines Decide What to Cite (And How to Be One of Them)

Rhasaun CampbellMay 11, 202614 min read

ai-citationsraggeostructured-datachatgpt

When I first ran IndexMind's probing system against getwrecked.com, I expected something. A few citations. Maybe a mention. The site had solid technical health, clean structured data, and real content. What came back was zero. Zero citations across ChatGPT, Perplexity, and Gemini. The AI models knew the site existed. They just didn't trust it enough to cite.

That result forced me to dig into the actual mechanics of how AI models select their sources. Not the theoretical version. The practical version, informed by running thousands of analyses and watching what moves citation rates versus what doesn't. What I found is that citation selection follows a specific, predictable process. And once you understand that process, you can work backward from it to make your content citable.

This article is the practical guide. I'm going to walk through exactly how AI engines choose what to cite, then give you a step-by-step playbook you can execute this week to start earning citations for your own content.

How Does ChatGPT Decide What to Cite?

When you ask ChatGPT (or Perplexity, or Gemini) a question, the model doesn't just pull an answer from its training data. For queries that benefit from current, sourced information, it uses a process called retrieval-augmented generation, or RAG. Understanding RAG is the foundation for everything else in this guide, so let me break it down in plain terms.

The model decomposes your question into sub-queries. A user question like "what's the best CRM for small agencies?" doesn't get processed as a single lookup. The model breaks it into smaller, more specific questions: what CRM tools exist? Which ones are designed for agencies? What features matter for small teams? How do they compare on pricing? Each sub-query retrieves its own set of candidate sources.

The model retrieves candidate pages. For each sub-query, the retrieval system pulls a set of web pages that appear relevant. This is similar to how a search engine works, but the evaluation criteria are tuned for content that can be synthesized into an answer, not content that should be ranked in a list.

The model evaluates candidates for citation. This is the step that matters most for your content strategy. The model scores each candidate page on several dimensions: does it directly answer the sub-query? Is the source authoritative? Is the content structured in a way that allows clean extraction? Is the information current?

The model synthesizes an answer and attributes citations. The final response combines information from the top-scoring sources, and the model attributes specific claims to specific sources through inline citations. Your goal is to be one of those attributed sources.

The key insight: AI models don't cite pages. They cite passages. A model might pull a single paragraph from a 3,000-word article because that paragraph directly answers one of its sub-queries. The rest of the article might be excellent, but if the relevant passage is buried or poorly structured, the model will find a cleaner passage on a competitor's page instead.

What Makes a Page Get Cited vs. Get Ignored?

I've analyzed hundreds of sites through IndexMind.ai and tracked which content earns citations and which doesn't. The patterns are consistent enough to be actionable. Here's what separates cited pages from ignored ones.

Does Your Page Answer the Question in the First 150 Words?

This is the single highest-impact factor. AI models retrieve and evaluate passages, and they weight content that appears early in a section or page. If your article starts with three paragraphs of context before stating the answer, a competitor's page that leads with the answer will win the citation.

I tested this directly with getwrecked.com. One of the pages had strong content about web development services, but the actual value proposition was buried after an introductory narrative. When I restructured the opening to lead with a clear, declarative statement of what the page covered and why it mattered, the page started appearing in AI-generated answers for related prompts within a few weeks.

The fix is editorial. Lead with the answer. Every section should start with one to two sentences that directly address the question that section answers. Contextualize after.

Does Your Structured Data Tell AI Models What to Extract?

Structured data (JSON-LD) serves as a metadata layer that helps AI models parse your content. Without it, the model has to infer what your page is about from the raw text. With it, the model has explicit signals about your content type, author, organization, and topic.

The structured data types that matter most for citation:

Article or BlogPosting schema tells models this is editorial content with a named author, publication date, and publisher. It establishes the content as a citable source rather than a product page or navigation element.

FAQPage schema explicitly marks Q&A pairs, making them trivially easy for AI models to extract and cite. If your page has a FAQ section without FAQPage schema, you're leaving citation potential on the table.

Organization schema establishes your brand as a named entity. When AI models encounter your Organization schema consistently across your site, they build a clearer association between your domain and your area of expertise. When I ran the first IndexMind analysis on rhasaun.com, missing Organization schema was one of the first two issues flagged.

Person schema on author pages connects your content to named individuals with credentials. AI models weigh authored content higher than anonymous content because authorship is an authority signal.

Does Your Content Cover the Sub-Queries?

Remember that AI models decompose a single user question into multiple sub-queries. If your page answers three of five sub-queries and a competitor answers all five, the competitor's page is more likely to be cited because it provides more comprehensive coverage.

Here's how this plays out practically. For the query "how to get cited by ChatGPT," the sub-queries a model might generate include: how does ChatGPT select sources? What content formats does ChatGPT prefer? What technical requirements affect citation? How do you measure ChatGPT citations? What are common mistakes that prevent citation?

A page that covers all of these sub-queries within a single, well-structured article has a higher citation probability than a page that only addresses two of them. This is why comprehensive pillar content outperforms thin, narrowly focused pages for AI citation, even though thin pages might rank well in traditional search for a specific keyword.

Does Your Content Use Declarative, Extractable Statements?

AI models cite content they can extract as standalone statements. Compare these two approaches:

Weak (not citable): "In the ever-evolving landscape of digital marketing, many experts suggest that there might be various approaches one could consider when thinking about how to potentially improve their visibility in AI-generated responses."

Strong (citable): "AI models cite content that directly answers questions in the first one to two sentences of a section, uses structured data to signal relevance, and demonstrates topical authority through comprehensive coverage."

The second version is a declarative statement. It can be extracted from the page and inserted into an AI-generated answer as a citation without needing additional context. The first version is hedged, vague, and impossible to cite cleanly.

Write like you're stating facts, not like you're hedging against being wrong. AI models reward clarity and confidence.

Is Your Content Current?

For queries where recency matters, AI models favor recently published or updated content. A guide dated 2023 competes at a disadvantage against a 2026 guide covering the same topic, even if the 2023 content is substantively better.

Keep your publication dates current. Update data points and examples quarterly. If you have evergreen content that's still accurate, update the publication date and add a note about when it was last reviewed. This signals to AI models that the content reflects current knowledge.

How to Get Cited by ChatGPT: The Step-by-Step Playbook

Here's the practical sequence you can execute this week. Each step builds on the previous one, and the full sequence takes about five working days for a team of one.

Day 1: Identify Your Target Prompts

Before you optimize, you need to know what you're optimizing for. Make a list of 10 to 15 prompts that your audience would ask an AI model about your industry, product category, or area of expertise.

These should be real questions in natural language:

"What is [your topic]?"
"How do I [thing your audience wants to do]?"
"What's the best [product/service category] for [audience segment]?"
"How does [your topic] compare to [related topic]?"
"[Your brand] vs [competitor]"

Test each prompt in ChatGPT, Perplexity, and Google's AI Overview. Record which sources get cited in each response. This gives you your baseline: where you're cited, where you're absent, and who's getting cited instead.

Day 2: Audit Your Existing Content

For each prompt where you're not getting cited, check whether you have content that should be earning that citation. Open the page and ask:

Does the page answer this specific question?
Is the answer stated clearly in the first 150 words of the relevant section?
Does the page have Article/BlogPosting JSON-LD with author and publisher info?
Does the page have FAQPage schema if it includes Q&A content?
Does the page cover the sub-questions a model would generate from this prompt?

If the answer to any of these is no, you've identified the gap. Most citation failures I see when running IndexMind analyses fall into one of these categories.

Day 3: Fix Your Structured Data

This is the technical foundation. If your structured data is missing or broken, fix it before touching your content. The changes are usually straightforward:

Add Organization JSON-LD to your site (if missing). This should include your company name, URL, logo, description, and social profiles. It goes on every page, typically in the site header or layout template.

Add Article or BlogPosting JSON-LD to every content page. Include headline, author (with Person schema reference), datePublished, dateModified, publisher, and description.

Add FAQPage JSON-LD to every page with a FAQ section. Each Q&A pair gets its own question and answer entry in the schema. Match the schema questions exactly to the heading text on the page.

Verify your implementation using Google's Rich Results Test or Schema.org's validator. Broken schema is worse than no schema because it sends conflicting signals.

Day 4: Restructure Your Content for Citation

Take your top three citation gaps (prompts where competitors are cited and you're not) and restructure the corresponding pages. For each page:

Rewrite the opening. The first 150 words should contain a clear, direct answer to the primary question the page addresses. State the answer as a declarative sentence. Then contextualize.

Restructure section headings. Each H2 should map to a question your audience asks. Replace generic headings like "Our Approach" or "Key Considerations" with question-format headings like "How Does [Topic] Work?" or "What Are the Most Common [Topic] Mistakes?" Question-format headings align with how AI models decompose user queries.

Add a FAQ section. Five questions minimum, written in the exact language your audience uses. Each answer should be two to four sentences: long enough to be comprehensive, short enough to be extractable. Apply FAQPage schema.

Add a comparison table if relevant. AI models frequently cite structured comparison data. If your page covers a topic that lends itself to comparison (tools, approaches, metrics, frameworks), add a table. Clear headers, concise cells, 8 to 12 rows.

Day 5: Publish and Monitor

Push your changes live. Then set a reminder to re-run your prompt tests in two to four weeks. AI models update their retrieval indexes on different schedules, so you won't see results immediately. ChatGPT and Perplexity tend to reflect changes faster than Google AI Overviews.

Track three things:

Did your citation count increase for any target prompts?
Did any new prompts start citing your content?
Did your competitor's citation count change for the same prompts?

This is the beginning of a measurement loop, not a one-time fix. Each cycle (optimize, measure, identify new gaps, optimize again) should move your citation rate upward.

What to Do When You're Still Not Getting Cited

If you've done the structural work and you're still not earning citations, the issue is usually one of three things:

Topical authority gap. The competitor's domain has deeper, more comprehensive content on the topic. AI models evaluate authority at the content level: a single thorough article can outperform a thin one, but if your competitor has 15 interconnected articles on the topic and you have one, their authority signal is stronger. The fix is publishing a content cluster: a pillar article plus supporting articles that cover sub-topics, all interlinked.

Entity confusion. AI models aren't sure who you are or what you do. This happens when your Organization schema is missing, when your about page is vague, or when your brand name overlaps with other entities. The fix is entity disambiguation: clear Organization schema, consistent brand naming across all pages, an explicit about page, and author profiles with Person schema.

Content freshness gap. Your content is older than your competitor's, and the model is prioritizing recency. The fix is updating your content with current dates, current data, and a clear "last updated" signal.

A Real Example: What I Found Running getwrecked.com Through IndexMind

When I built IndexMind, getwrecked.com became the live test environment where every feature gets validated before it ships to customers. Running it through the same analysis pipeline I built for clients gave me a grounding in what actually matters versus what's theoretical.

The initial analysis showed strong technical health. Page speed was solid, mobile responsiveness was clean, heading structure was logical. But the AI visibility score was low because the content wasn't structured for citation. The pages told a story, but they didn't answer questions. The difference sounds subtle, but to an AI model scanning for citable passages, it's the difference between a source and a non-source.

The first round of fixes were all quick wins. Adding Organization schema. Restructuring opening paragraphs to lead with declarative answers. Adding FAQPage schema to pages that had Q&A content. Ensuring every H2 mapped to a question users would actually ask.

The second round went deeper. Building topical depth through supporting content. Creating comparison assets (tables, structured breakdowns) that AI models could reference. Strengthening entity signals so models clearly associated the domain with its area of expertise.

The citation rate moved. Not overnight, and not uniformly across all models. Perplexity picked up the changes fastest. ChatGPT followed. The pattern confirmed what the data had been suggesting all along: citation is earned through structure and authority, and both can be systematically improved.

Frequently Asked Questions

How do I get ChatGPT to cite my website?

Structure your content so it directly answers questions in the first 150 words of each section. Add Article/BlogPosting and FAQPage JSON-LD schema. Build topical authority through comprehensive content that covers the sub-questions AI models generate from user prompts. Ensure your Organization schema is clean so the model understands who you are and what you're authoritative about.

Why is my competitor getting cited by AI and I'm not?

The most common reasons are: their content answers the question more directly (answer-first structure), their structured data is more complete, their page covers more of the sub-queries the model generates, or their domain has stronger topical authority signals for that specific topic. Run an AI citation gap analysis to identify which specific prompts are producing the gap.

Does structured data help with AI citations?

Yes. JSON-LD (Article, FAQPage, Organization, Person) helps AI models parse your content and evaluate its relevance. It's not a guarantee of citation. It does, however, significantly increase your eligibility by making your content easier for the model to understand and extract.

How long does it take to start getting cited by AI models?

Structural changes (schema, content restructuring) can show results within two to four weeks on faster-updating platforms like Perplexity. Deeper authority-building work (content clusters, topical depth) typically takes one to three months. Plan for a 90-day window to see meaningful, sustained improvement.

Can I get cited by AI models without a tool?

You can do basic citation tracking manually by testing prompts across AI models and recording which sources get cited. This works for a small number of prompts but doesn't scale. A tool like IndexMind automates the probing, tracking, and gap identification at scale and generates actionable fix recommendations for each gap.

We ran this article through IndexMind's AI visibility scoring before publishing. If you want to see which AI models cite your content and where competitors are winning your citation slots, run a free analysis with IndexMind. getwrecked.com is our live test environment where every feature gets validated before it ships.

Ready to see how AI sees your business?

Measure your AI visibility, track citations, and get actionable recommendations.