Skip to content

How to Block Bad Bots, Crawlers, Scrapers and Malwares?

April 26, 2012

Web hosting is the basic need to run a website on internet. Webmasters store their websites on web hosting servers like apache, IIS, cold fusion etc to access it worldwide. Internet is a wide and diverse area and its open accessibility praise and help bad bots to crawl a website. We choose optimum bandwidth while selecting a hosting plan. The selection of hosting plan depends on the bandwidth usage. A website with information context require less bandwidth as compare to live streaming or video gaming sites.

We often seen that some unidentified bots relentlessly use bandwidth of website that cause the loading speed of website become slow and affects the ranking position on search engines like Google, Yahoo, and Bing etc. Bad bots, scrapers and malwares are unwanted visitors to site and harmful to server bandwidth. These bots harness the resources of server. Here are some methods that are effective to stop this harness.

Block by Robots.txt

Robots.txt file is what that give permissions to crawlers to crawl a website. It is robots.txt file that block bad bots. This is maintained at root level under domain.com/robots.txt path and complied with useful information. Most crawlers abide the rules and conditions in robots.txt while crawling sites but some bad bots avoid the robots.txt instructions. We can edit it in notepad. Below is the basic code for robots.txt.

User-agent:*
Disallow:

These two lines allow all types of bots to crawl all pages of website. For example we want to restrict scooter bots then we write:-

User-agent: scooter
 Disallow:/

These lines restrict scooter bots to crawl our website. If you want to block only particular pages or high confidential pages then disallow them: –

User-agent: scooter
Disallow: /wp-admin/

This denies the scooter bots to crawl wp-admin page. Except wp-admin webpage, scooter bots are eligible to crawl all webpages. Some bad bots ignore the robots.txt file and don’t follow the instructions. We can Block such bad bots by .htaccess file. The .htaccess file is work and supported by apache server and very helpful to block scraper and malwares. .htaccess file allow us to block bots and scrapers by their name and IP address. We can access the name and IP address of bots from access log portion of cPanel. cPanel is the admin control panel of apache server that helps to set permissions and conditions of site.

Below is the code to block bad bots by their names.

# Redirecting offline browsers and ‘bad bots’ to a honeypot
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^AbachoBOT [OR]
RewriteCond %{HTTP_USER_AGENT} ^anarchie [OR]
RewriteCond %{HTTP_USER_AGENT} ^antibot [OR]
RewriteCond %{HTTP_USER_AGENT} ^appie [OR]
RewriteCond %{HTTP_USER_AGENT} ^ASPSeek [OR]
RewriteCond %{HTTP_USER_AGENT} ^asterias [OR]
RewriteCond %{HTTP_USER_AGENT} ^attach [OR]
RewriteCond %{HTTP_USER_AGENT} ^autoemailspider [OR]
RewriteCond %{HTTP_USER_AGENT} ^B2w [OR]
RewriteCond %{HTTP_USER_AGENT} ^BackDoorBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^BackWeb [OR]
RewriteCond %{HTTP_USER_AGENT} ^Baidu [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bandit [OR]
RewriteCond %{HTTP_USER_AGENT} ^BatchFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Black\ Hole [OR]
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^BlowFish [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^zerxbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^(.*)$ /public_html/honeypotdirectory/honeypot.php

Above are the name of some known bad bots, you can access more bad bots from Scamalert Networ .

Block bots by IP Address

The .htaccess file is the ultimate solution to control and maintain website. We can block bots by their IP address as give below: –

deny from 95.211.21.91
deny from 94.0.0.0/8
deny from 159.226.0.0/16
deny from 202.111.175.0/24
deny from 218.7.0.0/16

You can also deny all ip address simply by :-

Deny from all

These are some solutions to block bad bots and save bandwidth of website. Web hosting companies are responsible of regularly checking of such activities on website.

Advertisements

From → Uncategorized

One Comment
  1. Thanks for your personal marvelous posting! I genuinely enjoyed reading it, you may be a great author.I will always bookmark your blog and may come back later on to read more about Online Pharmacy. I want to encourage yourself to continue your great work, have a nice morning!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: