Log files contain the history of every person or crawler that has accessed your website. You will learn how to deal with log files.
One of the most obvious things you can do based on log file data is to try and spot anomalies within a time frame. For this, you need log file data that goes back a couple of days or even months. You can see spikes in the crawl behaviour, e.g., Googlebot was crawling very aggressively for 1 or 2 specific days. Or for instance, if you want to be found in China, but it does not seem to be happening and you then see in the log files that Baidu does not crawl your site at all, that would indicate that you have a problem.

You could obviously also break it down into just the bots that are actually accessing the website. You can get an idea what other types of crawlers are coming and processing data from your site. One of the up-to-date use cases would be the Google MFI switch, where you can see if the Google smartphone bot overtook the Googlebot desktop in terms of crawl volume.

You can understand what the top crawled pages by Googlebot are, and then go and verify if they coincide with your domains’ most important URLs. You can also breakdown the crawl requests & status codes by directory to understand how well or not the pages are crawled and if it happens regularly or with huge delays, or not at all.

Going deeper into the report, let’s have a look at redirects. Logfiles can help you figure out incorrect status codes, so in terms of redirects you’d be particularly looking for 302, 304, 307, 308 and then changing them to 301s except for geo-redirects. Watch out for redirect chains as well and try to figure out if there is something that you need to tackle on that end.

Also, it’s super important to understand crawl errors. There are two different categories that you should especially look out for. First one is the 4xx status code range, mainly 404s and 410s. The general approach for these is to see if those 404s happen specifically for one crawler rather than another. Depending on what is happening, you can decide what to do. It might only be happening for Bingbot say – or for all of the crawlers.

If you want to recover those URLs, bring them back and use 200 or if it doesn’t exist anymore but we want to keep inbound link equity, we need to implement a 301 to make sure this actually happens. If these 404s URLs are never coming back, you might consider changing them to 410s, because 410 says the URL is gone and never will be back. You’re doing it on purpose, not by accident. Google will reduce re-crawling and those pages will be removed from their index way faster.

Other important issues can be found in the 5xx status code range, especially 500 and 503. These happen from time to time so it is natural enough to see them in the log file. It is more about their volume and consistency. If there is a specific crawler from one specific IP that causes the same error over and over again, this is an issue to investigate.

Generally, from an SEO perspective that’s something you can’t do much about yourself and – particularly for the 5xx – it is usually an issue with the server or the infrastructure in general. In such a case it is probably necessary to pass the problem to the IT team.

Another important thing is to identify the top/worst crawled URLs and folders. The highly crawled pages and folders can be used for additional internal linking (add link hubs). Low crawled areas need to be linked more prominently. It reflects what Google bots spend their time on; it is wise to use URLs that are frequently crawled, for example to establish internal linking to new content items to get them indexed sooner rather than later. Also, when you understand what the worst crawled pages are, you can prioritize them and give them more attention.

Logfiles can help you see if (new) URLs have been crawled at all. If relevant URLs haven’t been discovered/crawled, your internal linking probably is too weak, and those pages need more additional internal links. You should also consider XML sitemaps, better/more prominent linking, etc.

