Technical SEO is the foundation every other SEO effort builds on. Without it, great content stays invisible because search engines either can’t crawl your pages, can’t interpret them correctly, or penalise slow load times before a user ever reads a word. This checklist covers the items that actually move the needle: crawlability, indexation, site performance, structured data, and the emerging layer of AI-citation readiness that most technical checklists still ignore.

Work through it in order. The earlier sections address blockers that prevent any ranking gains. The later sections address advantages that compound over time. If you’re auditing an existing site rather than building fresh, the SEO audit services page covers how to prioritise fixes by traffic impact.

For small businesses doing this for the first time, the small business SEO guide has useful context on which of these items to tackle when you’re resource-constrained.

Crawlability: Can Google Find Your Pages?

Crawlability is the prerequisite for everything else on this checklist. If Googlebot can’t reach a page, that page will never rank. The two files that control crawler access are robots.txt and your XML sitemap, and getting them right is straightforward once you understand what each one actually does.

robots.txt

According to Google’s own documentation, robots.txt is used primarily to prevent crawler overload, not to hide pages from search results. This is a common misunderstanding. Blocking a page in robots.txt does not reliably remove it from Google’s index. If another site links to that page, Google can still index a URL stub with no description. For true exclusion, use a noindex meta tag or remove the page.

What robots.txt should do: block low-value internal paths that don’t need to be crawled (admin sections, duplicate parameter URLs, staging paths that leaked into production). What it should not do: block JavaScript, CSS, or image files that Google needs to render and understand your pages.

Verify your robots.txt at yourdomain.com/robots.txt and test rules with Google Search Console’s robots.txt tester.

XML Sitemap

Google’s sitemap documentation states sitemaps are most useful for large sites (think 500+ pages), new sites with few inbound links, and sites with rich media. For a small, well-linked site with under 500 pages and solid internal linking, a sitemap is still good practice but less critical.

What matters: your sitemap should only list canonical, indexable URLs. Do not include pages with noindex tags, redirect chains, or URLs blocked in robots.txt. Submit your sitemap via Google Search Console and monitor the “Indexed vs Submitted” ratio. A big gap signals crawl issues.

Indexation: Are the Right Pages in Google’s Index?

Getting indexed is not the same as ranking. But pages that aren’t indexed can’t rank at all, so indexation hygiene comes before everything else.

Canonical Tags

Google describes canonicalization as “selecting the most representative URL when duplicate content exists.” Canonicalization hints (rel=“canonical”) tell Google which version of a page you consider authoritative. Common sources of duplication include HTTP vs HTTPS, trailing slash vs no trailing slash, www vs non-www, and session ID parameters appended to URLs.

Implement self-referencing canonical tags on every page, not just pages with obvious duplicates. This gives you a consistent signal and protects against duplicate content introduced by CDN or CMS behaviour you didn’t anticipate.

Important caveat from Google’s documentation: canonical tags are a hint, not a directive. Google may override your preference based on signals it considers stronger. If Google consistently ignores your canonical, something else is sending conflicting signals (usually internal links pointing to the non-canonical version).

Noindex Control

The noindex meta tag (<meta name="robots" content="noindex">) is the reliable way to keep pages out of Google’s index. Use it on thank-you pages, internal search results, thin parameter pages, and staging environments. Do not use noindex and canonical tags together on the same page, as the signals conflict.

Audit for accidental noindex tags using Google Search Console’s “Coverage” report. Filter by “Excluded > Noindex” to see pages you might have accidentally excluded.

Redirect Chains

Each redirect in a chain dilutes PageRank and slows crawl. Keep redirects to a single hop where possible. Redirect chains longer than three hops are a common side effect of multiple CMS migrations and are easily missed without a dedicated crawl audit.

Core Web Vitals and Site Performance

Core Web Vitals are Google’s performance metrics that directly influence rankings. According to web.dev’s Vitals documentation, all three metrics are measured at the 75th percentile of page loads across mobile and desktop.

Largest Contentful Paint (LCP)

LCP measures loading performance. Good: 2.5 seconds or less. Needs improvement: 2.5 to 4 seconds. Poor: above 4 seconds. LCP is most commonly improved by optimising the hero image (use WebP or AVIF, add fetchpriority="high" to the LCP element, avoid lazy-loading it), upgrading hosting, and removing render-blocking resources.

Interaction to Next Paint (INP)

INP replaced First Input Delay in March 2024. It measures responsiveness across the full page lifecycle. Good: 200 milliseconds or less. Needs improvement: 200 to 500 ms. Poor: above 500 ms. Heavy JavaScript, third-party scripts, and large DOM trees are the main culprits.

Cumulative Layout Shift (CLS)

CLS measures visual stability. Good: 0.1 or less. Needs improvement: 0.1 to 0.25. Poor: above 0.25. The most common cause is images without explicit width and height attributes (the browser doesn’t know how much space to reserve) and ad slots that expand after content loads.

Use Google Search Console’s Core Web Vitals report for field data and PageSpeed Insights for lab diagnostics. Field data (real users) is what Google uses for ranking.

Mobile-First Indexing

Google’s mobile-first indexing documentation is clear: Google primarily uses the mobile version of a site for indexing and ranking. If your mobile site has less content than your desktop site, you’re being ranked on the thinner version.

The requirement is content parity. Every piece of content, structured data, and metadata that matters for ranking must be present on the mobile version, not hidden behind tabs or omitted to save screen space.

Responsive design is Google’s recommended approach. It serves the same HTML at the same URL regardless of device, eliminating the risk of content divergence between mobile and desktop. Check your mobile rendering using the URL Inspection tool in Google Search Console.

One common failure mode: lazy-loading critical content behind user interaction (swipe or click) on mobile. This content often isn’t crawled.

HTTPS and Security

HTTPS has been a lightweight Google ranking signal since Google announced it as a tiebreaker factor. More practically, browsers mark HTTP pages as “Not Secure”, which reduces user trust and increases bounce rates.

Verify: your site serves all pages over HTTPS, the certificate is valid and not expired, HTTP URLs 301-redirect to HTTPS equivalents, and internal links and canonical tags all reference the HTTPS version. Mixed content (HTTPS pages loading HTTP assets) breaks the security indicator and should be fixed.

Structured Data and Schema Markup

Structured data is how you tell search engines and AI engines exactly what your content means, not just what it says. Google’s structured data documentation recommends JSON-LD as the implementation format and notes that Google uses structured data to understand content for both rich results and “to gather information about the web and the world in general.”

Key schema types to implement:

Page Type	Schema Type	Benefit
Homepage / brand	Organization	Knowledge panel eligibility
Blog / article	Article	Rich result with date and author
FAQ section	FAQPage	FAQ rich results, AI citation fodder
Product page	Product	Price, availability, review stars
Local business	LocalBusiness	Map pack and local panels
How-to guide	HowTo	Step-by-step rich results

Use Google’s Rich Results Test to validate markup before deploying. Monitor the “Enhancements” section in Search Console after launch.

On the schema markup guide there’s a deeper breakdown of which types carry the most weight for AI-citation readiness specifically.

AI Search Visibility: The Technical Layer That Most Checklists Miss

AI engines like ChatGPT, Perplexity, and Google AI Overviews pull from the same indexed web that Google Search uses, but they weight certain technical signals differently. Getting your technical SEO right also improves your chances of being cited in AI answers.

Structured Data as Citation Infrastructure

AI engines use structured data to confidently attribute facts to sources. A page with a properly implemented FAQPage schema containing a direct, attributed answer to a question is significantly more likely to be surfaced than an identical page without schema. The FAQ schema guide covers this in detail.

The principle: write your schema to answer questions explicitly. @type: FAQPage with acceptedAnswer properties gives AI engines a structured, quotable fact. That’s what gets cited.

llms.txt

A growing convention (not yet an official standard, but increasingly recognised) is publishing a /llms.txt file that gives AI crawlers a curated, plaintext summary of your site’s most important content. The llms.txt explainer covers what to include and which AI engines currently respect it.

AI Crawler Access

Make sure you haven’t accidentally blocked AI crawlers in robots.txt. Perplexity, Anthropic’s Claude, OpenAI’s GPTBot, and Google’s extended crawlers all use distinct user agents. Blocking them means your content can’t be used in AI answers at all. The AI crawler access guide has the current user agent strings and recommended robots.txt rules.

Track whether AI engines are actually citing your pages with a tool like Fokal which monitors your brand and content across ChatGPT, Perplexity, and Google AI Overviews simultaneously.

URL Structure and Site Architecture

Clean URLs are crawled more efficiently and earn more clicks. Keep URLs lowercase, use hyphens (not underscores) to separate words, and avoid unnecessary parameters. Remove or consolidate pages that serve no indexable purpose.

Breadcrumb navigation, implemented with BreadcrumbList schema as schema.org defines, helps both users and crawlers understand hierarchy. It also produces breadcrumb rich results in Google that increase click-through rate by making your listing look more authoritative.

Internal linking matters more than most sites acknowledge. Every important page should be reachable within three clicks from the homepage. Google’s crawlers documentation confirms Googlebot discovers new pages primarily by following links, so an isolated page is a page that risks being missed or under-crawled.

JavaScript SEO

If your site relies heavily on JavaScript for content rendering, test how Google actually renders it using the URL Inspection tool’s “Test Live URL” feature and checking the rendered HTML. Content injected purely via client-side JavaScript after page load is crawled, but with a delay that can affect indexation speed.

The practical checklist:

Server-side render (SSR) or statically generate your most important content
Do not lazy-load critical body text behind user interactions
Ensure canonical tags and meta robots tags are present in the initial HTML response, not added by JavaScript after load
Test with JavaScript disabled to identify what Googlebot would see before rendering

For a deeper treatment, the JavaScript SEO guide covers framework-specific approaches for React, Next.js, and Vue.

Technical SEO Audit: Running the Checks

The most efficient order for a technical audit:

Crawl the site with a tool that mimics Googlebot. Look for 4xx errors, redirect chains, pages blocked by robots.txt, and missing canonical tags.
Check Google Search Console for Coverage errors, Core Web Vitals field data, and manual actions. The GSC SEO audit guide walks through the full GSC workflow.
Validate structured data using the Rich Results Test across your key page templates.
Run PageSpeed Insights on mobile for your top-traffic pages. Field data (CrUX) is what Google uses; lab data shows you what to fix.
Audit robots.txt and sitemap alignment to confirm you’re not blocking pages you want indexed or including pages you don’t.

Run this audit on a schedule, not just after launches. Technical debt accumulates quietly. A quarterly crawl catches issues before they compound into ranking drops.

The on-page SEO services guide covers what to do once the technical foundation is clean and you’re ready to focus on content.

Technical SEO Checklist: Every Fix That Moves Rankings in 2025