Strategic Crawl Management | Lesson 11/34 | SEMrush Academy

You will get a deeper understanding of how to manage crawling and indexation.
Watch the full course for free: https://bit.ly/3gNNZdu

0:09 General problem
0:19 Using Robots.txt file
0:38 Using Robot Meta tag
1:29 Goal
2:10 Examples of URLs that should not be allowed for annexation
4:20 SeeRobots plug-in

✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹
You might find it useful:
Tune up your website’s internal linking with the Site Audit tool:
https://bit.ly/2XVxCmL
Understand how Google bots interact with your website by using the Log File Analyzer:
https://bit.ly/3cs0rfC

Learn how to use SEMrush Site Audit in our free course:
https://bit.ly/2Xsb3XT
✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹

The general problem with indexation directives is that they work very differently from each other and usually produce different results. If you use the robots.txt file and block something, then those URLs will not be crawled, but they are partly indexed and will therefore be shown in Google’s search results. Usually Google will just show the title and the URL, without a description. If you use the robots meta tag on the other hand, URLs will be crawled, but they are not indexed – if you apply a noindex, that is. So they won’t be shown in the search results.

Some people try to do both. If you use the robots.txt to block a specific file or directory and then also try to use the meta robots tag to block it so that it can’t be found in the search results, it won’t work. The reason is that Googlebot or other crawlers can’t even access your URL, because of being blocked in robots.txt – so they can’t read any potential indexation directive – and they simply wouldn’t see your “noindex”, even if it is present.

First and foremost, it’s important to define and understand what you want to achieve. Is it about crawling (resources) or about indexation? If you want to reduce the number of URLs being indexed, then the meta robots tag and a noindex on those pages would be the right approach.

Generally speaking I am a big fan of a very minimalistic robots.txt file. I try to almost not use it for crawler control unless I really have to.

I‘d recommend you try to answer this question for all the URLs that you have: “Do I really need this URL to be indexed?, Does it add any significant value to someone ending up at that URL?“

If this is not the case, then I would not index it. Here are some examples of URLs I think you should not allow for indexation:

I would never index empty or almost empty category or tag pages.
Nor would I index different versions of the same URL created by filtering and other types of content being rearranged, e.g. sorting a list up- and down. They have the same ranking target.
The same applies for dynamically generated pages, such as search results. SERP in SERP is something that Google does not like. Because the page content is dynamic, it could happen that the page has been initially ranked for something that’s not even on there anymore. When someone lands on it, it might have changed and the person can’t find what they’re looking for; a really bad experience for the user.
That’s also true for nearly all types of “no result pages”, you don’t want those be indexed.
Also make sure you do not index several versions of one page (e.g. index.php vs “/” or non-www vs. www or HTTPs vs non-HTTPs) or for the same content on different domains or subdomains. All of these do not add value, but bloat up Google’s index.
You have to be sure to integrate analytics to the process, especially for big domains where you might have suggestions to discuss with product or even site owners. Very often people suspect that they need a URL for a reason. But if you take a closer look, you’ll very often find that the above mentioned types of URLs drive literally zero organic traffic and the discussion is actually irrelevant.

No-indexing URLs doesn‘t have anything to do with crawl budget. So these URLs will still be crawled, and they will also still pass link equity to internal pages. URLs that remain set to noindex for a long time will get crawled less frequently. If you want to see indexation directives directly in the browser there is a great and very simple plug-in called SeeRobots, which visualizes the indexation directives whether it is in a meta tag or in a server header straight in the browser – so you can stop searching for it in the source code.

#TechnicalSEO #TechnicalSEOcourse #CrawlBudgetOptimization #SEMrushAcademy

You May Also Like