Before tracking the performance of your website and its ranking on search engines, it is very important to make sure that it is being read by Google ‘s robots and that it is well indexed.This is necessary if you want to appear in search results and therefore generate visits.
In Search Engine Journal, Noel Patel, co-founder of KISSmetrics, recently published 13 reasons that could be the cause of bad indexation (or non-indexation) of your website by Google. Indexation is your website’s front door to search engines and this is why we are offering you some additional explanations on the various point.
1/ Problems that prevent indexing
1.1/ Google hasn’t found your site yet
If you published your website today or a few days ago and it still isn’t appearing on search engines, that’s completely normal. This “problem” is usual for new websites, Google may take several days before listing them. If this continues, make a manual submission to Google. This allows you to take the lead and announce your presence in search engine.
It is also necessary to verify your sitemap.xml has been created and is accessible. This facilitates the search engines’ indexing work.
Quick tip : If you have a Twitter account, tweet the link of your home page. Google sees tweets and your website can be indexed faster than by using the submission form.
1.2/ The site is blocked by .htaccess
Your htaccess file is part of your website and is present on your server to make it accessible to the World Wide Web. This is a configuration file of HTTP Apache servers.
Sometimes, it may be necessary to protect the access to a directory on a web server to prevent anyone accessing it. For it, a piece of code is placed in the .htaccess file that allows to prevent a page displaying by requiring the use of a login and a password: htpasswd. This file sends an HTTP 401 code (“authentification is necessary to access the resource”) to the search engines’ robots telling them not to index the entire website.
So, to prevent indexing worries, check the contents of your htaccess file.
1.3/ The site is blocked with robot.txt
Another reason for non-indexation of your website is that It may be blocked by your robots exclusion protocol via the robots.txt file. This is a text file used to prevent access by search engine robots to one or more pages of your website. It is placed at a website’s root and can block the crawling of your website by Google bot.
To remedy that, you can check the state of your robot.txt file in your Google Search Console file and change it.
1.4/ WordPress: you’ve turned on your privacy settings
Numerous websites use the WordPress engine. During the creation of a website in WordPress there is a parameter in settings which, if you check it, will effectively change the robots.txt file. Make sure that this box is unchecked in the back office of your website so it can be indexed.
1.5/ Your URLs are blocked by a META tag
It is possible to prevent a page from displaying in Google search results by including a noindex meta tag in its HTML code. This code is to placed between the <head> </head> tags of your website. It informs engines that it should not be indexed or track links on this page.
<META NAME=”ROBOTS” CONTENT=”NOINDEX, NOFOLLOW”>
So you must ensure that you do not have the noindex, nofollow attributes on one or more pages of your website.
1.6/ Site indexed under a www.
The first important point is the sub-domain under which your website has been created. Many websites are hosted by default on the “www”sub-domain and It is often forgotten that http://myposeo.com is not the same as http://www.myposeo.com.
For your website to be accessible on both adresses, you have to set up a redirect with .htaccess file, for example http://myposeo.com to http://www.myposeo.com.
To make sure that you have well configured access to your website via these two URLs, you must go into your Google Search Console account and add these two domains so that they are all monitored:
In Google Search Console settings, you can then select the preferred domain for access to your website. It is this domain (sub-domain) that will be selected by Google for displaying your website in search engines.
2/ Problems that affect good indexation
2.1/ You don’t have a sitemap.xml
Not having a sitemap.xml may be the cause of bad indexation of your website by Google. This protocol allows to indicate which pages of the website should be explored by search engines robots. In this way, directions are given to Google to present pages in your website. So think about creating and sending a sitemap file via Google Search Console.
2.2/ You have crawl errors
Search engines robots read your website and “jump” from link to link to continue to explore it. Sometimes they find themselves in stalemate or on links that send back to not-existent or moved pages. Your website has “crawl” errors, they are errors in URLs. By using your Google Search Console account, you can identify them. Select your website then go to exploration errors. This page details URLs that Google cannot crawl or for which an HTTP code error displays.
It is then important to remove links of your website that point to non-existent pages and/or think about redirecting pages that no longer exist to alternative pages. To do this, not hesitate to download free crawl tool such as Xenu (pc) or Integrity (mac) that will allow you to detect ever deeper errors.
2.3/ You have duplicate content
Sometimes multiple pages of the same website propose the same content, this is duplicate content. It generates repetitions in the search results that Google is trying to eliminate. If Google meets content that is similar, It may slow down the crawl frequency.
To correct this problem, pick the page you want to keep and 301 the rest. Internet users and search engines will be redirected to the right page.
Note: There are “canonical’‘ tags which fight against the phenomenon of “duplicate content”. Indeed, If you want to set a favorite URL to use to access content, you can inform search engines of this. It is necessary to mark out the canonical page with an element “link” rel=’canonical’.
2.5/ Your site takes forever to load
Your pages’ load time can have a major impact on robots, as we mentioned in the previous point. If Google cannot access the various pages of your website, It cannot put itself “on break” and wait. Instead, It may leave your page and continue its way crawling other rival website. A page which “loads” quickly can be indexed much more quickly and can contribute to a better ranking.
Many tools allow you to test loading time of your pages such as GTmetrix or Yslow. Check out this post from CloudLiving.com on how Tung Tran speeds up his WordPress site and increase organic traffic by 39.1%.
Note: A reasonable loading time is considered to be 2-3 seconds.
2.6/ Problems with your server
Google can have difficulties reading pages of your website if your server cannot answer requests made by the robot to access information. To correct this, ensure that your server is solid and can support traffic. To ensure your server is always accessible you can use, for example, the excellent free Pingdom service which alerts you when your website is inaccessible.
2.7/ Your site has been deinedexed or penalized
A message is sent to announce that a website has been deinedexed or penalized by Google. This can happen if your website does not meet the criteria of Google Panda and Google Penguin filters. To understand and analyze the situation of your website, we recommend you visit your webmasters’ tools to consult Google’s quality guidelines. A case like this is rare but all websites which set up a Netlinking strategy or work on search engine optimization can be in Google’s viewfinder, so be careful not to overdo it!
So now you should understand that for a website to appear in search engine results it must be indexed in its database. First of all, Google should be able to find it and then be able to read it. If your website or some of your website pages are not indexed, you should now understand why. You now have all the elements you need to analyze your website and make sure it is well ranked.
Marketing manager @myposeo, community manager and writer.