Robots.txt and Sitemap.xml | Lesson 7/31 | SEMrush Academy

Watch our latest video: How to Go Viral on Quora

Get to know how search engines interact with pages and how to diagnose problems and improve websites.
Watch the full free course at SEMrush Academy:

0:05 Sitemaps
1:09 Robots.txt

✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹
You might find it useful:
Find and fix your site’s on-page and technical issues with the Site Audit tool:

Understand how Google bots interact with your website by using the Log File Analyzer:

Understand the basics and delve further into the specifics of technical SEO with our in-depth course:
✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹

In section two, we talked about how the search engines send spiders to crawl and index the content on your site. The spiders follow the internal links on your pages to move from page to page and discover your content – but there’s another way that the spiders discover content.

Your XML sitemap is like a behind-the-scenes list of all the URLs on your page. The search engine spiders access the file and follow the links to index the content on your site. Think of your XML sitemap like a backup for your page architecture. The primary route for content discovery is your internal linking structure, so make sure your site is organized well and is easy to navigate.

Whenever you start SEO work on a new site, you should log into Search Console and check that the sitemap has been submitted and indexed. If you see that the sitemap has been indexed and crawled, but the indexed page count is less than it should be, that shows that something’s not right.

You’ll then want to head over and check another behind-the-scenes file – robots.txt. This is a text file that’s stored in the root of your site directory that instructs the search engines about which pages they should crawl.

If you don’t want the search engines to crawl a page, you can put a directive in the robots.txt file. While this can help keep a page out of Google’s index, it’s not going to remove a page that’s already in the index. You can also use robots.txt to block entire sections of your site – a folder containing admin assets, for example.

If you want to remove a page from Google’s index, you’ll want to use the meta robots tag. You could use a noindex, follow command to keep a page from being in the index, but still have the spiders crawl and follow the links on the page. If you use a noindex, nofollow command, the page won’t be indexed and the spiders won’t follow any of the links on that page.

Most of the time if you’ve got pages missing from the index, they’re either blocked with robots.txt or the robots meta – so check both to make sure that neither one is causing the problem.

#SitemapXML #RobotsTXT #SEOcourse #SEOtutorial #SEMrushAcademy

You May Also Like