Introduction
How did I make this list?
No need to say that I didn't invent the wheel with this. 90% of the technical requirements are good practices that can be found in Google official resources. I was also inspired by the following resources:
- Clay Cazier's SEO requirements for a new website platform (2016-07-27 - Search Engine Land). This article is probably the best article on technical SEO I found, as it is really about technical SEO. Logically, many of the features listed by Clay Cazier are in my article. However, I was bothered by the lack of grouping between the features. I also felt that some features were too specific (feature 19 about AngularJS) and some were repetitive. The distinction between Platform SEO requirements and Platform SEO near-requirements wasn't relevant to me.
- Matthew Howells-Barby's How We Increased Organic Traffic by Over 50% Using Technical SEO Updates (2016-01-29 - HubSpot). This article is an interesting case study that shows how important are the technical features for SEO. The article focuses on several features that are in my list: 404 pages, pagination, structured data, customizable text zones, hreflang and XML sitemap. The article also covers language meta tags, which, I think, are not necessary and should be replaced by a lang attribute in the HTML opening tag of the document.
Who is this list for?
The added-value of this list depends on the type of website you use it for.
1. For custom-developed websites, this list is a gold mine (or at least I hope it is!). You have an updated overview of what needs to be done to make an SEO-friendly website. Web developers, this list is really for you.
2. For websites based on CMS platforms, the majority of the technical features listed below should already be integrated in the CMS core code, even though it is not always the case. And if there are not, it is not always possible to implement them without interfering with the core of the CMS, which is not recommended. However, the list can be useful to compare CMS software regarding their SEO friendliness.
If you work in SEO, this checklist is also a solid basis to perform technical SEO audits. A technical SEO audit is not about spotting pages with missing meta descriptions. Technical SEO audits are really about checking the presence of technical requirements on a website.
The list
A SEO-friendly website needs to have specific technical requirements in its core. I listed 35 features or mega-features into 6 different groups.
1. Crawlability and indexability
- The main XML sitemap is accessible at the root of the domain.
- The XML sitemap(s) include all "to be indexed" pages.
- The XML sitemap(s) exclude all "not to be indexed" pages.
- If appropriate, the XML sitemap(s) include links to images.
- The robots.txt file is accessible at the root of the domain.
- The robots.txt file lists all "disallow/not to be indexed" pages.
- The link to the main XML sitemap is in the robots.txt file.
- Each page type of your website has structured data in JSON-LD format.
- Each page type of your website has valid HTML code, validated on https://validator.w3.org/.
- Each page type of your website has valid CSS code, validated on https://jigsaw.w3.org/css-validator/.
- Your website has a 404 page with a proper 404 HTTP status code.
- All "not to be indexed" pages include a "noindex" meta tag.
- For websites translated in 2 or more languages, all pages have hreflang attributes with appropriate regions/languages codes.
2. URLs
- All pages of your website are in HTTPS.
- All versions of your domain have an automatic 301 redirection to your primary domain.
- When the URL of a page has changed, the old URL has a 301 redirect to the new URL.
- All URLs should be unique and clean.
- A page can only be accessed through 1 unique URL (no duplicate pages).
- URLs of generic pages should be paginated, especially generic pages with infinite scroll.
3. Content
- Each page has a customizable title tag, with an optimal length of 50 to 60 characters.
- Each page has a customizable meta description tag, with an optimal length of 50 to 160 characters.
- Each page has customizable text zones at the top of the page, with customizable headings (h1 to h4).
- Each image has customizable information: you can choose the name of the file (as it appears in the URL), the title tag (name of the image) and the alt tag.
4. Loading performance
- The images of your website are optimized (compressed).
- The HTML, CSS and JavaScript code used by your website is minified (code minification).
- The JavaScript code used by your website is split into bundles (code splitting).
- Your website uses premium hosting.
- You serve cached content of your website with reverse proxy servers and/or a content delivery network (CDN).
- Your website is fast on mobiles (smartphones).
5. User experience
- Your website fits all screens (desktops, tablets, smartphones), with ideally responsive web design (RWS).
- Your website has a custom (well designed) 404 page.
- Your website does not have broken links (no 404 pages).
- Your website has a high definition favicon.
6. Social
- Every public pages of your website have open graph metadata.
- Every public pages of your website have complementary Twitter Cards tags.
Further exploration
Crawlability and indexability
Crawlability and indexability are at the heart of how search engines work, so it is not surprising to put this section as the first one on my list.
Not to index pages
There are mainly 2 reasons to prevent a page from being indexed:
- For security and privacy reasons: the page is only intended for one or several users. For example, the private back-office pages of an e-commerce website.
- For quality reasons: the content of a page is of low quality, due to poor or duplicate content.
If you need to prevent a page from being indexed by search engines or de-index one that has already been inexed, you should read this: Block search indexing with 'noindex'.
One or several XML sitemaps?
There is no preference related to having a unique or several XML sitemaps for a website. But if a website has or can have a lot of pages, multiple sitemaps are the way to go. The only requirement is to create a sitemap index file, listing all sitemaps of the website. You can learn more about this here: Split up your large sitemaps.
JSON-LD structured data
For JSON-LD structured data, you should read:
- Understand how structured data works.
- Explore the search gallery.
- Full Hierarchy of structured data available. This page lists all the possible data that can be structured with JSON-LD. Depending on what information you want to put on a page, you have to find what data can be structured. The more data you can structure on a page, the better it is to help indexation for search engines.
Search-engine friendly code
Search-engine friendly code is standard HTML and CSS. To make sure your code is valid, you can use the W3C validation tools:
For JavaScript-related topics:
- How Does Google Handle Javascript When Crawling, Rendering & Indexing Pages.
- The Ultimate Guide to JavaScript SEO.
- The Basics of JavaScript Framework SEO.
Finally, here are some guidelines from Google regarding JavaScript:
404 page
A 404 page is necessary to indicate to users and search engines that a URL is no longer valid. However, search engines hate 404 pages, as it signals broken links. The fewer, the better for SEO.
404 pages generally appear in 2 cases:
- A page is deleted or a URL is changed. This scenario can be avoided by redirecting the non-valid URL to a valid one (301 redirect).
- An external website is linking to a page that never existed. A 301 redirect can also be put in place in this case.
For the user, the best thing is to have a customized (and if possible original) 404 page. You can find some inspiration on these websites:
- Dribble.
- Awwwards.
- Muz.li.
- bonjour404.fr (404 pages in English and French).
hreflang attribute
If you want to have different languages for a website, there are 3 recommendended options:
- One top level domain name per language.
- One sub-domain name per language, with the same top level domain name. This is what Wikipedia is doing.
- One subdirectory per language with the same top level domain name.
You can read more about this in these articles:
Primary domain
Your primary domain should start with https and I recommend to keep the www. So, your primary domain should start with https://www.
Change of URLs
Please note that you should avoid changing URLs. Search engines like stability, and especially don't like change of URLs.
Clean URL
A clean URL is a URL made of words that humans can easily read and understand. The authorized characters are the following:
- the 26 lowercase letters from a to z.
- the 10 digits from 0 to 9.
- The dash punctuation mark (-).
Duplicate content
If you stick to the rule "1 page = 1 unique URL", canonical URL tags are useless within the same website. If you have pages accessible by selecting filters (facets), tags or categories, make sure that those filters, tags and categories are always ordered in the same way in the URL. If there is no systematic order set, canonical tags are needed.
URLs - Infinite scroll
Should you use infinite scroll or not? Even though there isn't a definitive answer to this question, we can state that:
- Infinite scroll seems to offer better user experience (except if you're trying to reach the footer).
- As stated in the list, pagination is the safest way to go in terms of SEO.
However, if you do want to use infinite scroll, please refer to these articles:
- Infinite scroll search-friendly recommendations.
- SEO Friendly Pagination: A Complete Best Practices Guide.
Optimized images
To know more about images:
- Image file formats: everything you've ever wanted to know.
- Differences Between File Formats (RAW, DNG, TIFF, GIF, PNG, JPEG).
- How to optimize images for web and performance.
- Appendix: Image format support and usage.
User experience and performance
User experience and performance are vast and complex issues. I recommend to start with these Google resources to get an overview of these subjects:
A website compatible with all devices
There are 3 technical options for cross-device compatibility:
- Responsive Web Design.
- Dynamic serving.
- Separte URLs.
Google's stand on this is adamant: Responsive Web Design is the way to go.
Open graph metadata and Twitter Cards
You can refer to these sources for implementation: