As we discovered in a 2016 DeviceAtlas Mobile Web Intelligence Report, up to 50% of all website traffic can be attributed to bots, dark traffic, spammy referrals and all sorts of ne’er-do-well actors.
Removing these sources from your reporting suite not only gives you a more accurate picture of where genuine traffic is coming from, but can help you plan ahead and put your spend where it works best for you. Muddying the waters can lead to incorrect assumptions and guesses, costing you money.
But what if you want to remove useless traffic not only from your reports, but from your overall web experience too? Is it possible to stop bots at the door, so they don’t appear in your reports and don’t even get to see your content? This can reduce infrastructure costs, bandwidth etc.
Let’s first examine where bots came from, why they were created, and why we’re now inundated with fake traffic, ad-fraud and click-spam as a result.
The evolution of bots
Bots, in the web-sense at least, are described as a “software application that performs automated tasks by running scripts over the internet”.
We use them every single day, unconsciously interacting with an armada of automated tasks in almost every interaction online.
They’ve been around as long as the web, and their building blocks long before. Today’s web-bots owe their existence to the groundwork laid by the likes of Babbage, Bertrand Russell and Alan Turing. The idea of creating a mechanical device to process information in an intelligent way has been the holy grail for mathematicians, engineers and lateral thinkers since Aristotle.
Today, the most common varieties of bad-bots we’ll encounter on the web exist to serve one of the following aims:
- Spambots that harvest email addresses from contact or guestbook pages
- Downloader programs that suck bandwidth by downloading entire websites
- Website scrapers that grab the content of websites and re-use it without permission on automatically generated doorway pages
- Viruses and worms
- DDoS attacks
- Botnets, zombie computers, etc.
However, not all bots are bad for you. Many are of huge benefit to your site, such as search engine bots, without which Google may not have become the monster it is. Being listed in search results is a positive thing, so blocking those bots is a definite no-no.
The first such instance was WebCrawler, first put to work in 1995 by AOL (and then Excite two years later). Googlebot, the most famous of them all, was created in 1996, and went on to become the most sought after visitor to websites seeking organic, “free” traffic.
The first signs of bots being used at scale in a somewhat commercially-shady sense occurred, predictably, as the internet entered our homes. The 1990’s saw an explosion in connectivity, and the late 2000’s witnessed the smartphone boom, giving bots even more room to grow into.
If there’s easy money to be made, you can guarantee some humans will find a way to take advantage, and the rise in bots is no different. Hubspot provide a good breakdown of the most common types of good/bad bots here.
The Google Analytics solution
Right out of the box, Google Analytics allows you tick a box to “Exclude all hits from known bots and spiders”.
Great – but this only excludes known bots from reports, and can’t spot new bots, or indeed those that don’t behave in a manner typical to how Google trigger this filter.
So how do you spot a bot?
The most common fingerprint of a bot visit (within Google Analytics) is very low quality traffic – indicated by 100% New Sessions, 100% Bounce Rate, 1.00 Pages/Session, 00.00.00 Avg Session Duration, or all of the above.
Monitoring referrers within Google Analytics can help you identify domains sending such poor traffic, which you can then exclude from your reports (Screenshot above courtesy of Freshegg.co.uk, who also provide a decent guide on how to deal with GA filters).
However, this option will not stop bots from accessing your site and content, meaning your infrastructure costs increase with zero benefit to you or your company, and possibly a negative effect should they be of the really-naughty variety.
Cutting the problem at source
Assuming you’ve managed to exclude most known malicious bots and crawlers from your Analytics reports, the next step is to look at streamlining your server by stopping them before they get to waste your precious resources.
It’s possible to prevent bots accessing your site using a few different methods, such as using Robots.txt, your htaccess file, or adding custom meta-tag directives to your site (or specific pages). However, to do this effectively, and avoid inadvertently blocking helpful bots from your site, patience and meticulous attention to detail is required.
There’s also the possibility that the nasty bot will simply ignore your instructions.
Wouldn’t it be great if you could employ a bouncer, of sorts, to stop visitors before they enter, checking their credentials before allowing them beyond the velvet rope? No need to waste resources, bandwidth and effort on visitors that aren’t bringing any value, and could even cause trouble for your regular patrons.
This way, only real visitors and helpful, worthy non-humans will get to see your site.
Employing a dedicated bot detection solution can take the strain off you and your team, by handling all aspects in real-time. With an accurate database of known and new bots, and the ability to allow the good ones through, it’s definitely something worth considering.