Allowing web crawlers to scan your site is vital if you want your web pages to appear in Google, Bing and other search results.
However, unwanted traffic spikes caused by non-human visitors can be costly in terms of bandwidth, website stability, potentially leading to site outages. To help you understand web crawlers, bots and spiders visiting your site, we’ve compiled a list of the most common instances of non-human traffic we see in our data, including their User Agents for reference.
For our latest dive into the data, we looked at the numbers for Q4 2018.
The data is sourced from thousands of websites built and hosted on the goMobi web publishing platform. Below is the breakdown of all the bots we saw.
The most active crawler is Googlebot
Given their dominance of all things search, it's no surprise to see Google topping the list, driving 28.5% of all bot hits in our data.
We spotted 91 variations of Google crawlers and bots, down from the 146 individual UAs we saw over the first half of 2018.
Below are the most common Google crawlers that appeared in our Q4 2018 data, and the User Agents associated with them.
|8.5% | Googlebot | Search Engine | Mobile|
|Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)|
|5.8% | AdsBot Google | Advertising Bot | Mobile|
|Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1 (compatible; AdsBot-Google-Mobile; +http://www.google.com/mobile/adsbot.html)|
|5.3% | Googlebot | Search Engine | Mobile|
|Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html) Safari/537.36|
|4.3% | Googlebot | Search Engine | Desktop|
|Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)|
|2.3% | Googlebot | Images | n/a|
|0.80% | Googlebot | Media Partners | n/a|
Bing almost as active as Google
Combined, the nine Bing bots we see accounted for over 22% of all bot hits in our data, just behind Google's.
Interestingly, three of Bing's mobile crawlers announce themselves as an iPhone – the very first version, released way back in 2007. The other two masquerade as a Nokia Lumia 520 and a "Generic Windows Phone".
You can read about all crawlers used by Bing, and exactly what each of them does here. Below are the user agents for the most active three Bing crawlers we saw in Q4 2018.
|8.4% | Bingbot | Search Engine | Desktop|
|Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)|
|7.9% | BingPreview | Search Engine | Mobile|
|Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b|
|5.8% | Bingbot | Search Engine | Mobile|
|Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)|
Unsurprisingly, given its omnipotence, Facebook's externalhit bot is a regular visitor. We saw 6 different types of Facebook crawler in our data.
The presence of Facebook's various crawlers has dropped a lot since we last studied the data. It reached 8.7% of all bot visits in the first half of 2018, but in Q4 this didn't even reach 3%.
The main use of this crawler is page previews. When a link is pasted within the platform, Facebook crawl the target page and pull information such as title, description and preview/featured images, as below:
The Facebook crawler UAs seen most often are:
|2.6% | Facebook | Social Media Agent | Desktop bot|
|0.3% | Facebook | Social Media Agent | Desktop bot|
One of the earliest internet pioneers, Yahoo has undergone major changes in recent years. In 1994 they were called "Jerry and David's guide to the World Wide Web", and simply listed other websites. There was no search offered, but in the year 2000 they integrated Google's product. By 2004, they had developed their own search functionality.
There is one main crawler – Yahoo! Slurp. After that, we see Yahoo Pipes, YahooCacheSystem and Yahoo Nutch, with none of those three accounting for a significant amount of traffic.
Here are the main crawlers, and their respective UAs.
|0.13% | Yahoo! Slurp | Search Engine | Desktop|
|Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)|
Others to note
So we've looked at the "Big Four" and their crawlers and bots, and provided the most common UserAgents we see across our network. What else in the data is interesting?
With over 5% of the overall bot hit share, Sogou Spider is the 7th busiest bot on the list. It's the web crawler for Beijing-based search technology provider, Sogou.com.
|4% | Sogou Spider | Search Engine | Desktop|
|Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)|
New to the top spots for 2019 is Comscore's Proximic, a crawler that "enables advertising partners to determine the best matching campaign for a page's content".
|1.7% | Proximic | Search Engine | Desktop|
|Mozilla/5.0 (compatible; proximic; +https://www.comscore.com/Web-Crawler)|
Next, we have Baidu, China's answer to Google and the fourth most visited site in the world (Alexa).
|1.1% | Baidu Spider | Search Engine | Desktop|
|Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)|
With a DeviceAtlas Cloud Standard, Premium or Enterprise account, you can identify non-human traffic (robots, crawlers, checkers, download agents, spam harvesters and feed readers) in real-time. You can then decide how to act on this information, whether to block all undesired bots at the door, or just treat them in a different way to legitimate human visitors.
Read more about bot detection and how it can help your business to:
We also have some handy resource lists such as a list of User Agents for the most popular smartphones and devices and the most common mobile browsers across 35 countries.