Overlaying Сrawl & Log File Data for Better Results | Lesson 22/34 | SEMrush Academy

Log files contain the history of every person or crawler that has accessed your website. You will learn how to deal with log files.
Watch the full course for free: https://bit.ly/3gNNZdu

0:28 Try and combine various data sources with each other
1:17 Overlay your sitemap with your log files
1:47 Overlay a web crawl data with your log file data
2:26 Look at your indexable pages and see if they are being crawled
3:07 See if your non-indexable pages are being crawled

✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹
You might find it useful:
Understand how Google bots interact with your website by using the Log File Analyzer:

Learn how to use SEMrush Site Audit in our free course:
✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹

One of the really exciting things is to gain insights that you would not have had before, right? A really cool thing I do a lot and I think this really can generate very valuable insights – is to try and combine various data sources with each other.

Certainly, you can use log files as shown previously – but this is a limited view. It gets way more exciting when you combine your log file data with input from other data sources. The most obvious way to do this is to use data from a web crawl and then combine this with logfiles to compare simulated behaviour with for example Googlebots’ actual behaviour.

You can also take data from Google Analytics or GSC – or all of those sources at once. Another easy way could be to utilize your XML sitemap and overlay it with the data from your log files.

So let me walk you through a couple of things that you could actually do:

So one of the easiest things you could do would be to overlay your sitemap with your log files. Eventually you would see that the data may indicate a lack of internal links within the site architecture because if your site architecture is working properly, all URLs included in the sitemap should actually also have been crawled. If not, there is something wrong.
If you have data from a web crawl, it could discover a URL that has been set to noindex – for whatever reason. If you then were to overlay that URL with data from your log file you would eventually see that this noindex URL is crawled very frequently. In this case, setting up to noindex maybe was not the best idea, right? This happened for one of our clients where the team made a change in their CMS and actually set some very strong products pages to noindex – which should not have happened, of course. So overlaying log file data with other sources can also help to reveal mistakes and can act as a maintenance routine.
Another report can be when you take a look at your indexable pages and see if they are really being crawled or not, and if so how often. This can be a great starting point to understand if they just need improvement or if you should reconsider indexation all together – maybe they are just not good enough? Or you might want to consolidate them with other contents on your site and get rid of this URL from the index entirely. Generally speaking, if Google does not crawl them at all, there will be a reason for it; and you need to figure it out and act accordingly.
You could also take all your non-indexable pages, not necessarily only meta robots noindex. But also those that contain a canonical tag referencing an alternate URL or pages that are blocked in robots.txt, yet are still being crawled. This is a great way to understand if Google considers your hints and directives properly. If that’s not the case, you need to improve the relevant URLs.
Many things can be done by overlaying different data sources, but the general approach is to build this gap analysis and try to understand the major differences. Is it something that you did correctly? do all crawlers behave the same way?- or does Googlebot behave differently from what you were expecting altogether? Comparing crawl simulations with logfile data is super powerful – and once you’ve identified the differences you can immediately take action.

#TechnicalSEO #TechnicalSEOcourse #LogFileAnalysis #SEMrushAcademy

You May Also Like