User-Agents | Reverse-DNS | Lesson 7/34 | SEMrush Academy

You’ll gain an understanding of search crawlers and how to optimally budget for them.
Watch the full course for free: https://bit.ly/3gNNZdu

0:19 User Agent Variable
0:48 User Agent String
1:55 User Agent Switch or User Agent Override
2:36 Reverse DNS Lookup
2:55 User Agent Based Delivery
3:09 Cloaking

✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹
You might find it useful:
Tune up your website’s internal linking with the Site Audit tool:
https://bit.ly/2XVxCmL
Understand how Google bots interact with your website by using the Log File Analyzer:
https://bit.ly/3cs0rfC

Learn how to use SEMrush Site Audit in our free course:
https://bit.ly/2Xsb3XT
✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹ ✹

The user-agent appears in the HTTP request header and in general this request is sent from the browser to a web application.

The user-agent variable is basically filled by the browser or the crawler. Different browsers and crawlers fill this field with different values. Crawlers often have an URL or email address included with their user-agent string, so the website owner can contact the operator of the crawler. The user-agent string is one of the criteria by which web-crawlers may be excluded from accessing certain parts of the website, e.g. using robots.txt or more generally the robots exclusion standard.

As with many of the other https headers, information in the user-agent string have not necessarily been unified. It contributes to the information that the client sends and its content can vary considerably depending on who is actually using and filling it.

Let’s assume that Google is visiting a website, so the user-agent string for a crawler will contain something like „Googlebot“, or if it’s a Google News Crawler it will say more precisely “Googlebot-news”. Google and other search engines have various types of crawlers; they mainly can be differentiated based on capability. For example, you can have Google’s desktop crawler or their smartphone crawler. But they also have different crawlers for verticals, such as for images, for video or for news. If you want to simulate being Googlebot or another bot, you can simply use a plug-in called the User Agent Switcher in Chrome or user Agent Overrider in Firefox. It allows you to set any given type of user-agent string and to see if the website or the web offering in general reacts differently based on the user-agent that will be provided by request.

Which then leads us to another problem. If you could set any given user-agent, you could basically claim to be Googlebot, when really you are not. So, we also need a capability to verify if the user agent and the respective request is real or fake. For this purpose, you can run what’s called a reverse DNS lookup on the accessing IP-address from your log files using the host command. Verify that the domain name – if you’re trying to validate Googlebot – either contains Googlebot.com or Google.com.

Historically, user-agent based delivery – essentially delivering something for one user agent (like a crawler) and something else for another one (like all your users) – has been used in the context of what’s called cloaking, which is a clear violation of Google’s guidelines. So really understand and make sure that what you deliver to Googlebot is the same as what you are actually delivering to your users.

From a more practical standpoint, whenever you do an audit, for example, it really makes sense to run a test and see if by accident developers could implement something, that is just available for Googlebot and not for the regular user. I’d always recommend that you run this simulation, especially if it’s a website you’re not 100% familiar with.

Furthermore what you can also do is serve different indexation directives for user-agents such as Googlebot – where you allow the site to be indexed – and Googlebot-news where you might want to noindex a particular site.

#TechnicalSEO #TechnicalSEOcourse #UserAgent #ReverseDNS #SEMrushAcademy

You May Also Like