You’ve probably heard of bots on the Internet by now, but you might not be sure what they are or what they do.
Internet bots or web robots, web crawlers, spider bots, or simply bots – are software tools that automatically run various tasks on the Internet. Depending on the bot creator, these tasks can be for good or bad. Bots usually imitate tasks that would otherwise be done by actual humans – like indexing web pages on the Internet or responding to people’s questions. The latter is known as customer service.
Good bots are typically used to automate tasks, like website scanning or data collection, and are generally used to make our lives easier. Bad bots, however, are used for malicious purposes in the online world. Imperva Incapsula’s annual report on bots found that 25.6% of bot traffic could be attributed to bad bots, a number that has largely remained the same over the last couple of years.
Bad bots are used to hack, spam, spy, interrupt, and compromise websites of various sizes. You’ve already dealt with these annoying bots if you have an online presence. There are ways of dealing with bots. Since bots make up a quarter of the Internet traffic today, chances are you won’t be able to avoid them altogether, but with our help, you won’t be bothered by them on a regular basis.
We have some solutions to get rid of bots on your website. But first, let’s learn more about bots and why most people dislike them.
- What Is Bot Traffic?
- Common Misconceptions About Bots
- How to Disable Bad Bot Traffic on Your Website – The Hard Way
- Blocking Bad Bots on Your Website – The Easy Way
- Blocking Bots Leads to ‘Healthier’ Websites
What Is Bot Traffic?
When someone visits your website, analytics will log that session and attribute it to a user with specific characteristics like the web browser they’re using and their operating system. That’s regular traffic, and the other statistics they produce are called identifiers.
Now we covered what a single bot is, but what about bot traffic? By bot traffic, we refer to the traffic generated by bots. Bot traffic often has a negative meaning, but everything depends on the function of the bots. Generally, bots are essential in the service of a search engine or digital assistant, for example, Google or Siri.
The vast majority of businesses do like having such bots on their websites. But many are also used to stuff credentials or steal data and create DDoS attacks. Sometimes bad bots are annoying – including unauthorized web crawlers – because they can disrupt web analytics and create click fraud.
Common Misconceptions About Bots
You probably had to assume that every bot is vicious and must be removed from your website unambiguously.
That’s sadly impossible to prove, but one thing is sure. Not all bots on your website are bad. Googlebot, for instance, is a bot developed by Google that crawls websites and indexes URLs to its search engine. These are called search engine bots. Therefore, this bot is extremely helpful – and by producing enticing content online – it’ll help you climb the SERP ladder faster.
Another misconception is that bots can be cut off by using DDoS protection and Web Application Firewalls (WAF). While both online tools can help your website survive a botnet or DDoS attack, they’ll hardly do anything to a few bots crawling your website.
Some bots, though, are malicious, designed to produce false information and pretend to use legitimate websites to steal personal data.
A study by Norton suggests that on popular social media like Twitter, there’s a large number of bot accounts that spread fake news and deceive people. These are classified as “emerging threats” and can also spread malware to unsuspected users.
On the other hand, some bots also crawl your website for SEO reasons.
Some of the best SEO software includes automation and aggregation that help SEO professionals automate repetitive tasks. Amongst the bots that SEO professionals use are Semrush and Ahrefs – both of these are marketing bots that crawl thousands of web pages daily and can help marketers deploy their online strategies.
How to Disable Bad Bot Traffic on Your Website – The Hard Way
Why is this hard? This solution takes a lot of effort on your part, a lot of knowledge, and a lot of time.
If you’re having a problem with bots spamming your website, you’ll need to find out where they came from first. This will all get very technical, so try to follow along as best you can. If you get lost, don’t worry, the easy way is only a few paragraphs down.
To find out where the bots came from and block them, you’ll need either the IP address from which the bots were sent or their User Agent String. An IP address is a unique identifier used to identify each computer on the Internet through a string of numbers separated by periods. A User Agent String, on the other hand, is the name of the actual program. For example, a Google search engine bot goes by Googlebot/2.1.
You will need to access your raw weblog to find either of these things. At the HostPapa dashboard, you can find your raw weblogs in My cPanel under Metrics > Raw Access and by clicking an option under the Download Current Raw Access Logs.
These files are usually large and must be decompressed through a file archiver program like 7Zip. Once the file has been decompressed, open it in an ASCII text editor like Notepad or TextEdit, which are built into Windows or macOS installations.
Now you have to scan the web file to find the bot you want to block. Some helpful identifiers are:
- Knowing when the bot tried to gain access to your website
- Knowing the web page it interacted with
With either or both of those pieces of information, you should be able to track down an IP address or the User Agent String. Once you have located either or both of that information, jot them down and prepare for the next step.
Remember that this solution is more of an easy workaround, and you may not get the desired result. The next step is to block the IP address or User Agent String you found, which could backfire on your project website. Just because a single bot attacks from one IP address doesn’t mean it’ll come from that same IP address the next time. By blocking random IP addresses, you could very well stop an entire Internet Service Provider (ISP) and all of the customers using that ISP from accessing your website.
Hackers are clever and will often name their bots after browsers or software everyone uses. This becomes problematic when you try to block a bot named “Safari” and block every person using the Safari web browser in the same process. The same risks come with blocking specific User Agent Strings. If you are unsure what you are doing, you might be better off using the easy solution further down the blog.
Blocking Malicious Bots Through The .htaccess File
If you still feel that this solution is worth the risk, the next step is to download your .htaccess file from your website’s root folder. You can do this using an FTP client like Filezilla or the cPanel file manager from your HostPapa dashboard. Alternatively, if you have a WordPress-based site and the YoastSEO plugins installed, you can edit your .htaccess file right from the WordPress dashboard. If you can’t find the file in your root directory, chances are it doesn’t exist, and you’ll have to create one.
WARNING! One wrong change on your .htaccess file can potentially break your website, so make sure you back up your website before making any further changes.
If you manage to find the .htaccess file, the next step is to open it in your ASCII text editor or open a new document if you need to create a new .htaccess file. Using a word processor like Office, Word, or WordPad to create this file can cause your website to fail when you re-upload the .htaccess file, so make sure you use an ASCII text editor.
To block an IP address, add the following lines of code to your .htaccess file (just add the actual IP address you want to block in place of the example IP addresses we listed below):
Order Deny, Allow
Deny from 126.96.36.199
If you already have text in your .htaccess file, add the above code to the bottom of the file. You can add another line of code with the same “Deny From ___” format for each IP address you wish to block. You can block as many IP addresses as you need to. However, note that the longer your list becomes, the more sluggish your website can become.
Blocking a User Agent String is very similar to blocking IP addresses.
Let’s say you found a bot that you want to block named “SpamRobot/3.1 (+https://www.randomsite.com/bot.html)” you would add the following code to your .htaccess file (replacing SpamRobot with the actual bot that you found):
BrowserMatchNoCase SpamRobot bad_bot
BrowserMatchNoCase OtherSpamRobot bad_bot
Order Deny, Allow
Deny from env=bad_bot
To add multiple User Agent Strings to block, add another BrowserMatchNoCase line above the “Order Deny, Allow” line of code. Just like blocking IP addresses, adding too many bots to block can slow down your website.
Once you’ve finished updating your file, save it as a “.htaccess” file WITH the quotation marks included. Upload your updated or brand-new file to your website, and you should be safe from the IP addresses and User Agent Strings you’ve identified.
Remember, this fix won’t protect your website from all bots. In fact, it only protects you from the specified IP addresses and User Agent Strings you have blocked. Hackers are smart these days; if you block them with this tactic, you will also be blocking several other users from accessing your site. Dynamic IP addresses also change periodically, meaning you could block an innocent user instead of a bot if you use this solution.
Blocking Bad Bots on Your Website – The Easy Way
Services like Cloudflare can protect your website and improve its performance. Since Cloudflare employs a large CDN (Content Delivery Network) in its service, your website will get noticeably faster and more secure for every user worldwide.
At the start of the blog, we mentioned that there were good and bad bots.
Services like Cloudflare, SiteLock and Sucuri use good bots to deal with incoming bad bots automatically. As you probably saw in the previous chapter, despite being free, it’s a long, strenuous process that may not even protect your website from most bots. Alternatively, SiteLock and Sucuri will take care of spam bots for you, among many other features.
Both of these services continually scan your website for intruders and remove them if they are found. Other fixes include:
- Eliminating backdoors
- Updating plugins and your website’s core components
- Using the CAPTCHA method to block bots more effectively
Another way to block bots from entering your website is with a Web Application Firewall, DDoS monitoring and prevention, backdoor mitigation, and behavioural analysis. On top of all the security these services provide, SiteLock also gives users access to a Global CDN to speed up your website.
The important thing to remember about this solution is that it’s easy, automated, and reliable. You don’t have to worry about editing your files or blocking IP addresses when you have a service like SiteLock or Sucuri because they detect bot traffic for you and do it much more efficiently.
SiteLock, Sucuri, or other security services, will protect your website from incoming bad bots and patch up your code to ensure that nothing malicious can get on your website. Visit HostPapa to see our Sitelock plans to ensure your website remains secure.
Blocking Bots Leads to ‘Healthier’ Websites
If you don’t run one of the most popular websites that get over 100,000 views per day, then chances are the majority of your traffic comes from bots and not humans.
Bots play a big part in Internet activity today. Our smartphones, mobile apps and connected devices fetch information and access various websites anytime. Other advanced bots are also capable of automatically buying products online. These bots go to eCommerce stores with limited inventory and place products on their carts, preventing legitimate users from buying them, and ultimately hurting their sales.
While a lot of that activity can have malicious intent, there are many bots out there made to make our lives earlier.
Take advantage of good bots’ services and take all the necessary precautions to protect your website from any malicious bot that tries to access your website’s content. With the information provided on this blog, you have all the tools to try and prevent malicious bots and stop bot traffic from ruining your reputation. If you have any issues, contact us day or night, and our support team will do everything they can to help.