Intro to Log File Auditing | Lesson 19/34 | SEMrush Academy

Log files contain the history of every person or crawler that has accessed your website. You will learn how to deal with log files.
Watch the full course for free:

0:10 Log File Auditing
0:24 Importance of Log File Auditing
1:43 Characteristics of a log file
2:04 What a log file contains
2:58 Log file data

✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹
You might find it useful:
Understand how Google bots interact with your website by using the Log File Analyzer:

Learn how to use SEMrush Site Audit in our free course:
✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹

Log files are generally stored on a web server and – simply speaking – contain a history of each and every access by a person or crawler to your website. They can give you some idea of how search engine crawlers are handling your site.

Why should you care?

They can help you understand crawl priorities. You can see which pages are prioritised by search engines and should, therefore, be considered the most important.
Secondly, they can help to prevent reduced crawling. Google may reduce its crawling behaviour/frequency & eventually rank you lower if you constantly serve huge numbers of errors.
You want to understand global issues. You want to identify any crawl shortcomings (such as hierarchy or internal link structure) with potential site-wide implications.
You also want to ensure proper crawling. You want to make sure Google is crawling everything important: primarily ranking relevant contents, but also older & fresh items.
The last goal is to ensure proper linking. You want to ensure that any gained link equity will always be passed using proper links and/or redirects.
Only access log files can show how a search engine’s crawler is behaving on your site; all crawling tools are simply trying to simulate their behaviour.

The characteristics of a log file are relatively simple; it is just a text file. The content and structure in log files can vary and they depend on your web server (Apache, NGINX, IIS, etc.), caching and its configuration. Make sure to identify which setup you are running and how things look from an infrastructure perspective.

Usually, a log file contains

the server IP/hostname
the timestamp of the request
the method of the request (usually GET/POST)
the request URL
the HTTP status code (everything is fine for 200, or 301 for redirect)
the size in bytes for the response – of course always depending on if it has been setup to store this information or not
Also, logfiles store the user-agent. The user-agent helps you to understand if the request actually came from the crawler or not.
When you work with log file data, you need to ask the right questions. Log file data can be quite overwhelming because you can do so many different things; make sure you’ve got your questions prepared.

Log file data can be very different to Google Analytics data, for example. While log files are direct, server-side pieces of information, Google Analytics uses client-side code. As the data sets are coming from two different sources, they can be different.

When requesting access to a log file, keep in mind that you do not need any personal information, so when you talk to your IT or to a client, you don’t need to worry. It’s essentially only about the crawler requests from Google or Bing. No need for any user data (operating system, browser, phone number, usernames, etc.). It is not relevant.

Also, you need to be aware of how the server infrastructure is set up. If you are running a cache server or a proxy and/or a CDN which creates logs elsewhere, we will need those logs as well to get the whole picture.

#TechnicalSEO #TechnicalSEOcourse #LogFileAnalysisSEO #SEMrushAcademy

You May Also Like