AI search is reshaping how people find information. Instead of scanning ten blue links, users ask ChatGPT a question, run a Perplexity search, or read a Google AI Overview. The answer they get cites a handful of sources, and every other result is invisible.
That makes a simple question urgent for anyone who publishes content online: what do these AI engines actually weigh when choosing which sources to cite?
This guide breaks down the ranking signals across ChatGPT search, Perplexity, and Google’s AI features (AI Overviews and AI Mode), based on what each platform has publicly disclosed and what observable patterns reveal.
How AI Search Differs From Traditional Search
Traditional search engines return a list of links. AI search engines do something fundamentally different: they retrieve sources, synthesize information across those sources, and generate a single response with inline citations.
This means the “ranking” question splits into two stages:
- Retrieval: which pages make it into the engine’s consideration set?
- Citation selection: which of those pages get named in the final answer?
A page can rank well in retrieval but never get cited if its content is vague, redundant, or poorly structured. Understanding both stages is key to AI search optimization.
The Three Major AI Search Engines
Before diving into specific signals, it helps to understand how each engine’s architecture shapes its preferences.
ChatGPT Search
ChatGPT search is built on a fine-tuned version of GPT-4o, post-trained using synthetic data generation techniques that include distilling outputs from OpenAI’s o1-preview model. It leverages third-party search providers along with content provided directly by publisher partners to surface information.
OpenAI has established partnerships with major publishers including Associated Press, Axel Springer, Condé Nast, Dotdash Meredith, Financial Times, Hearst, Le Monde, News Corp, Reuters, The Atlantic, Time, and Vox Media. Any website or publisher can choose to appear in ChatGPT search results.
The system is designed to “highlight and attribute information from trustworthy news sources,” according to Pam Wasserstein, President of Vox Media, in OpenAI’s announcement. Responses include links to sources such as news articles and blog posts, with a Sources button that opens a sidebar with references.
For a deeper look at optimizing specifically for ChatGPT, see our ChatGPT SEO guide.
Perplexity
Perplexity operates its own in-house search, indexing, and crawling infrastructure. According to Perplexity’s technical blog, their search index “uses sophisticated ranking algorithms to ensure high quality, non-SEOed sites are prioritized.” Website excerpts, which Perplexity calls “snippets,” are provided to their models to enable responses with up-to-date information.
Perplexity describes itself as “your AI-powered answer engine for fast, trustworthy research” that “combines live web search with multiple leading AI models to give you up-to-date answers, backed by citations you can verify.”
Their models are evaluated across three core criteria: helpfulness, factuality, and freshness. Their Deep Research mode takes this further, performing “dozens of searches, reads hundreds of sources, and reasons through the material to autonomously deliver a comprehensive report.”
Google AI Overviews and AI Mode
Google’s AI features use a custom Gemini model, upgraded to Gemini 2.0 for harder questions including coding, advanced math, and multimodal queries. AI Overviews are now used by more than a billion people globally.
Both AI Overviews and AI Mode use a “query fan-out” technique, issuing multiple related searches concurrently across subtopics and multiple data sources, then bringing results together. According to Google’s developer documentation, “while responses are being generated, our advanced models identify more supporting web pages, allowing us to display a wider and more diverse set of helpful links associated with the response than with a classic web search.”
Google has reported that “the links included in AI Overviews get more clicks than if the page had appeared as a traditional web listing for that query.”
For optimization strategies specific to AI Overviews, see our AI Overview optimization guide.
The Core AI Ranking Factors
Based on what these platforms have disclosed and what consistent patterns emerge across all three, here are the signals that matter most.
1. Authority and Trust
Every AI engine needs to determine whether a source is credible enough to cite. The signals differ by platform, but the principle is universal: AI engines prefer sources that established audiences already trust.
What this looks like in practice:
- Domain reputation built through consistent, accurate content over time
- Backlink profiles from other authoritative sites (these feed the traditional search indexes that AI engines query)
- Publisher partnerships (ChatGPT has direct content agreements with major publishers)
- Brand mentions across the web that establish topical authority
Perplexity’s ranking algorithms specifically prioritize “high quality, non-SEOed sites,” which suggests that manipulative link building or keyword stuffing is more likely to hurt than help.
2. Content Structure and Clarity
AI engines extract “snippets” from your pages (Perplexity uses this term explicitly). The easier your content is to parse, the more likely it is to be accurately extracted and cited.
Structural signals that help:
- Clear heading hierarchy that maps to the questions users ask
- Concise paragraphs that make distinct, attributable claims
- Direct answers near the top of sections, with supporting detail below
- Lists and tables that present comparative or categorical information cleanly
This is especially important for Google’s AI features, where the query fan-out technique searches across subtopics. A well-structured page with clear sections on distinct subtopics can satisfy multiple branches of a single fan-out query.
Learn more about structuring content for AI in our AI content optimization guide.
3. Factual Accuracy
Perplexity explicitly evaluates its models on “factuality,” asking whether responses provide “accurate answers without hallucinations, even for questions that require very precise or niche knowledge.” Their Deep Research achieved 93.9% accuracy on the SimpleQA benchmark, a test bank of several thousand questions designed to evaluate factuality.
This has a direct implication for source selection: AI engines prefer sources that contain verifiable, precise claims rather than vague generalizations. If your page says “many experts agree that…” without specifics, it provides less citation-worthy material than a page that names the research, states the numbers, and links to primary sources.
How to strengthen factual signals:
- Cite primary sources and link to original research
- Include specific data points, dates, and named entities
- Attribute claims to identifiable experts or organizations
- Update content when facts change
4. Recency and Freshness
All three engines treat recency as a ranking signal, though the weight varies by query type.
Perplexity built its entire model architecture around freshness. Their online LLMs were specifically designed to address the limitation that “LLMs often struggle to share up-to-date information.” By providing models with “knowledge from the web,” Perplexity’s models can “accurately respond to time sensitive queries, unlocking knowledge beyond its training corpus.”
ChatGPT search was designed so users “can get fast, timely answers with links to relevant web sources,” with specific mention of “up-to-date sports scores, news, stock quotes, and more.”
Google’s AI Mode uses “fresh, real-time sources like the Knowledge Graph, info about the real world, and shopping data for billions of products.”
How freshness applies:
- Time-sensitive queries (news, scores, pricing) heavily favor recent content
- Evergreen queries still benefit from recently updated pages
- Publication dates and “last updated” timestamps signal freshness to crawlers
- Regular content updates tell AI crawlers your site is actively maintained
5. Indexability and Technical Access
This is the most straightforward factor, and the one most often overlooked. If an AI engine can’t crawl and index your page, it can’t cite you.
Google’s developer documentation states clearly: “To be eligible to be shown as a supporting link in AI Overviews or AI Mode, a page must be indexed and eligible to be shown in Google Search with a snippet.”
Google’s technical requirements for AI features include:
- Ensuring crawling is allowed in robots.txt and by any CDN or hosting infrastructure
- Making content easily findable through internal links
- Providing a great page experience
- Making sure important content is available in textual form
- Supporting textual content with high-quality images and videos
- Making sure structured data matches visible text on the page
For ChatGPT search, OpenAI states that “any website or publisher can choose to appear” in results, which implies an opt-in or at minimum a non-blocking mechanism via crawler access.
For detailed guidance on managing AI crawler access, see our llms.txt guide.
6. Structured Data and Schema Markup
Google explicitly lists structured data as part of its SEO best practices for AI features, noting that sites should ensure “structured data matches the visible text on the page.” While structured data alone won’t get you cited, it helps AI engines understand what your content is about and verify that your markup reflects reality.
Schema markup matters because:
- It provides machine-readable context about your content type, author, dates, and topics
- Product, FAQ, and How-To schemas give AI engines structured facts to pull from
- Accurate schema increases the chance that your content satisfies a specific subtopic in a fan-out query
7. Brand Mentions and Entity Recognition
AI engines don’t just retrieve documents. They build internal representations of entities (brands, people, products, concepts) and associate those entities with topics. When a user asks about a topic your brand is strongly associated with, you’re more likely to be cited.
Building entity association:
- Consistent NAP (name, address, phone) information across the web
- Mentions in authoritative publications and industry resources
- Active presence on platforms that AI engines reference (news sites, professional directories, review platforms)
- Content that clearly associates your brand with specific topics and expertise
This connects directly to answer engine optimization, where the goal is to become the default source for specific queries.
Engine-Specific Patterns
While the core factors above apply universally, each engine has distinct preferences worth noting.
ChatGPT Search Preferences
ChatGPT search leans heavily on its publisher partnerships. Content from partnered publishers like Reuters, the Financial Times, and the Associated Press appears to receive preferential treatment in citations. The system was built in collaboration with the news industry, with OpenAI stating they “collaborated extensively with the news industry and carefully listened to feedback from our global publisher partners.”
For non-partner sites, the path to citation runs through the third-party search providers that ChatGPT queries. Strong traditional SEO (backlinks, domain authority, keyword relevance) still matters because those search indexes are the retrieval layer.
Perplexity Preferences
Perplexity’s emphasis on “non-SEOed sites” signals a preference for content that reads naturally and provides genuine value over content engineered primarily for search rankings. Their fine-tuning process uses “carefully curated high quality, diverse, and large training sets” evaluated on helpfulness, factuality, and freshness.
Perplexity also tends to cite a wider range of sources per response than ChatGPT, which creates more opportunities for niche and specialized content to earn citations.
Google AI Features Preferences
Google’s AI features build on decades of search infrastructure. According to their developer documentation, “you can apply the same foundational SEO best practices for AI features as you do for Google Search overall.” There are “no additional technical requirements” beyond standard search eligibility.
The query fan-out technique means Google’s AI features pull from more sources per response than a traditional search result. Google notes this approach “helps you access more breadth and depth of information than a traditional search on Google,” which means well-structured content covering specific subtopics has a strong chance of being pulled in as a supporting link.
What This Means for Your Strategy
The common thread across all three engines: they reward content that is genuinely useful, factually accurate, well-structured, and easy to access. There is no secret trick. The signals favor the same qualities that make content valuable to human readers.
Here’s where to focus:
Get the basics right first. Ensure your pages are crawlable, indexable, and technically sound. This is the foundation everything else depends on.
Structure for extraction. Write content that AI engines can easily parse into distinct, citable claims. Use clear headings, direct answers, and specific data.
Build real authority. Earn mentions, links, and recognition in your space through genuine expertise and original insights. All three engines prioritize trustworthy sources.
Stay fresh. Update your content regularly, especially for topics where recency matters. Publish dates and update timestamps signal maintenance.
Think in entities. Build your brand’s association with specific topics through consistent, focused content across your site and the broader web.
For a complete framework on building your AI SEO strategy, start with an AI visibility audit to see where you stand today across all three engines. From there, you can prioritize the factors that will move the needle fastest for your specific situation.
Understanding what AI engines weigh is the first step. The next is building a systematic approach to generative engine optimization that covers all three platforms consistently.