The most active bots and crawlers on the web

Allowing web crawlers to scan your site is vital if you want your web pages to appear in Google, Bing and other search results.

However, unwanted traffic spikes caused by non-human visitors can be costly in terms of bandwidth, website stability, potentially leading to site outages. To help you understand web crawlers, bots and spiders visiting your site, we’ve compiled a list of the most common instances of non-human traffic we see in our data, including their User Agents for reference.

For our latest dive into the data, we looked at the numbers for Q4 2018.

The data is sourced from thousands of websites built and hosted on the goMobi web publishing platform. Below is the breakdown of all the bots we saw.

bots-and-crawlers

The most active crawler is Googlebot

Given their dominance of all things search, it's no surprise to see Google topping the list, driving 28.5% of all bot hits in our data.

We spotted 91 variations of Google crawlers and bots, down from the 146 individual UAs we saw over the first half of 2018.

Below are the most common Google crawlers that appeared in our Q4 2018 data, and the User Agents associated with them.

8.5% | Googlebot | Search Engine | Mobile
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
5.8% | AdsBot Google | Advertising Bot | Mobile
Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1 (compatible; AdsBot-Google-Mobile; +http://www.google.com/mobile/adsbot.html)
5.3% | Googlebot | Search Engine | Mobile
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html) Safari/537.36
4.3% | Googlebot | Search Engine | Desktop
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
2.3% | Googlebot | Images | n/a
Googlebot-Image/1.0
0.80% | Googlebot | Media Partners | n/a
Mediapartners-Google

Bing almost as active as Google

Combined, the nine Bing bots we see accounted for over 22% of all bot hits in our data, just behind Google's.

Interestingly, three of Bing's mobile crawlers announce themselves as an iPhone – the very first version, released way back in 2007. The other two masquerade as a Nokia Lumia 520 and a "Generic Windows Phone".

You can read about all crawlers used by Bing, and exactly what each of them does here. Below are the user agents for the most active three Bing crawlers we saw in Q4 2018.

8.4% | Bingbot | Search Engine | Desktop
Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
7.9% | BingPreview | Search Engine | Mobile
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b
5.8% | Bingbot | Search Engine | Mobile
Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)

Facebook

Unsurprisingly, given its omnipotence, Facebook's externalhit bot is a regular visitor. We saw 6 different types of Facebook crawler in our data.

The presence of Facebook's various crawlers has dropped a lot since we last studied the data. It reached 8.7% of all bot visits in the first half of 2018, but in Q4 this didn't even reach 3%.

The main use of this crawler is page previews. When a link is pasted within the platform, Facebook crawl the target page and pull information such as title, description and preview/featured images, as below:

The Facebook crawler UAs seen most often are:

2.6% | Facebook | Social Media Agent | Desktop bot
facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
0.3% | Facebook | Social Media Agent | Desktop bot
facebookexternalhit/1.1

Yahoo!

One of the earliest internet pioneers, Yahoo has undergone major changes in recent years. In 1994 they were called "Jerry and David's guide to the World Wide Web", and simply listed other websites. There was no search offered, but in the year 2000 they integrated Google's product. By 2004, they had developed their own search functionality.

There is one main crawler – Yahoo! Slurp. After that, we see Yahoo Pipes, YahooCacheSystem and Yahoo Nutch, with none of those three accounting for a significant amount of traffic.

Here are the main crawlers, and their respective UAs.

0.13% | Yahoo! Slurp | Search Engine | Desktop
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

Others to note

So we've looked at the "Big Four" and their crawlers and bots, and provided the most common UserAgents we see across our network. What else in the data is interesting?

With over 5% of the overall bot hit share, Sogou Spider is the 7th busiest bot on the list. It's the web crawler for Beijing-based search technology provider, Sogou.com.

4% | Sogou Spider | Search Engine | Desktop
Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)

New to the top spots for 2019 is Comscore's Proximic, a crawler that "enables advertising partners to determine the best matching campaign for a page's content".

1.7% | Proximic | Search Engine | Desktop
Mozilla/5.0 (compatible; proximic; +https://www.comscore.com/Web-Crawler)

Next, we have Baidu, China's answer to Google and the fourth most visited site in the world (Alexa).

1.1% | Baidu Spider | Search Engine | Desktop
Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)

With a DeviceAtlas Cloud Standard, Premium or Enterprise account, you can identify non-human traffic (robots, crawlers, checkers, download agents, spam harvesters and feed readers) in real-time. You can then decide how to act on this information, whether to block all undesired bots at the door, or just treat them in a different way to legitimate human visitors.

Read more about bot detection and how it can help your business to:

We also have some handy resource lists such as a list of User Agents for the most popular smartphones and devices and the most common mobile browsers across 35 countries.

Leave a Reply

Exclusive tips, how-tos, news and comment

Receive monthly updates on the world of mobile dev.

Other Products

Market leading device intelligence for the web, app and MNO ecosystems
DeviceAtlas - Device Intelligence

Real-time identification of fraudulent and misrepresented traffic
DeviceAssure - Device Verification

A free tool for developers, designers and marketers to test website performance
mobiReady - Evaluate your websites’ mobile readiness

© 2024 DeviceAtlas Limited. All rights reserved.

This is a website of DeviceAtlas Limited, a private company limited by shares, incorporated and registered in the Republic of Ireland with registered number 398040 and registered office at 6th Floor, 2 Grand Canal Square, Dublin 2, Ireland