|
BestPrac.Org
Stop Spam : Best Practice in Email
Spam Prevention and Eradication
Spam Bots - and how to avoid them : Part 2
(Released - January, 2003)
.....Continued from Part 1
CGI-Generated Redirection Page: A not so commonly known, yet very powerful tool to hide from spam bots in those cases where putting an email address on a website is unavoidable is to use a PERL script.
Any webmaster who uses PayPal (as just one example) to collect payments for product sales or donations will find this idea extremely useful. A PayPal link always includes your email address, which does a double-act as your account user-name. (When will PayPal do the sensible and secure thing of issuing a synonymous account number with all accounts, rather than have users continue to put their email addresses at risk of the spam bots? Please pardon this little digression.)
Instead of the sales button going to directly to PayPal, it could instead be simply sent to a very simple CGI script, whose sole purpose is to open a single CGI-generated web page which then asks the customer to confirm their order. Here, the real transaction, with the proper link to PayPal and your email address in the html code, takes place.
It is, admittedly, one more "click" in the sales or transaction process. On the bright side, the email address is not on a static web page. It is inside a cgi-script, in your cgi-bin, where spam bots have no access (presuming the file permissions are properly set).
PayPal aside, there are plenty of other times this method could come in handy. We recommend that you add extra security by still encoding your email address, even on the CGI-generated page. You can never take too many precautions when it comes to deterring spammers and their spam bots.
The cgi script for this purpose is one of the simplest possible scripts, even a beginner at PERL programming could write. Still, we will save you the trouble. We wrote this script ourselves and are happy to allow you to use it free of charge. (Though, if you find it useful, please consider sending us a donation for an amount of your choosing.) Here it is...
- Copy and paste the following into a plain text editor, such as Notepad or Wordpad. Ignore the colours and other formatting - they are just there to make life a little easier for you to see what you are getting. Repeat - do NOT save any of the HTML fomatting - only the plain text.
- Then, save the file with any file name you wish, ending with the extension ".pl". (We do not wish to recommend any one file name for this. If this becomes a commonly used technique and everyone uses the same file name, spammers may well start searching specifically for web sites using this file name. Therefore, please choose a unique and random name of your own.)
- Finally, in the HTML form, be sure that the instruction <FORM METHOD="post" etc> is used to call the script. Do not use METHOD="get". And, when uploaded to your cgi-bin in ASCII mode, set the chmod permissions to 755.
==== Copy EXACTLY what appears below (but not including) this line ====
#!/usr/bin/perl
#Copyright 2002, Bestprac.Org.
#Freeware
use CGI
$query = new CGI;
print $query->header;
print <<EOHTML;
<HTML>
<HEAD>
<TITLE>
XXXXX Replace this with your own title XXXXX
</TITLE>
</HEAD>
<BODY>
Insert whatever HTML you wish to appear here, between the <BODY> & </BODY> tags - including your email address. For extra security, still use ASCII or Javascript encoding of your email address.
</BODY>
</HTML>
EOHTML
==== Copy EXACTLY what appears above (but not including) this line ====
Another example of using a CGI script for similar purposes can be found at the "Spambot Beware" website. Their example includes their cgi script in either the C programming language, or in PERL.
Blocking USER_AGENT environment variables of spam bots from visiting your web site: The information in this section pertains mostly to web sites hosted on Apache servers. These have the largest share of the web-server software market at present.
Every time a web browser, or a search engine robot, or a spam bot, or other type of request is made to access a web page, that software 'lodges' its name with the web server from which it is trying to access pages. That 'name' is known as a "user_agent".
It is possible, in fact relatively simple, if you know the name (user_agent) of the spam bot, to set access conditions which can deny access to your web pages by that user_agent.
The difficult part is in knowing the user_agent used by the spam bots. A partial list of known spam bot user_agent titles is published by SimplyTheBest.Net. (2008 update - link removed. Page no longer found.)
Another list of "unspecified" user_agents is found at Robotstxt.Org. Use this list with caution. Not all bots are spam bots. Some have legitimate purposes, such as for search engines, link verifying services, and so forth. No doubt, though, many on this list are spam bots. (You may like to copy & paste some of them into a good web search engine, and into Google Groups usenet search engine, to see if there are other reports about these user_agents.)
These lists can never be comprehensive. The evil geniuses who program spam bots have been sometimes known to deliberately forge common and respected user_agent names (such as those of popular MSIE or Netscape web browsers, for example) into their spam bot software to prevent them from being blocked. Still, it provides a good start and a method of blocking many of the known spam bots in use today.
With that caveat understood, you need to create an .htaccess file in the root directory of your website on the Apache server. (Or amend it with these additional details, if you are already using an .htaccess file for allowing or deny access for any one or more of many possible reasons.) Presuming you do not already have an .htaccess file in your root directory, make one as follows:
- Copy and paste the following into a plain text editor, such as Notepad or Wordpad. Ignore the colours and other formatting - they are just there to make life a little easier for you to see what you are getting. Repeat - do NOT save any of the HTML fomatting - only the plain text.
- Then, save the file with the exact name of ".htaccess" (without the quote marks, of course). Yes, indeed - the "." really is the very first character.
- Finally, upload this file using your FTP client in ASCII mode (not in binary mode) to your root directory.
==== Copy EXACTLY what appears below (but not including) this line ====
SetEnvIfNoCase User-Agent "EmailCollector/1.0" spam_bot
SetEnvIfNoCase User-Agent "EmailSiphon" spam_bot
SetEnvIfNoCase User-Agent "EmailWolf 1.00" spam_bot
Order Allow,Deny
Allow from all
Deny from env=spam_bot
==== Copy EXACTLY what appears above (but not including) this line ====
Of course, there are more than just the three known spam bots shown in the above example. Just add an extra line and copy & modify the syntax for each extra spam bot you wish to add to your .htaccess file.
Some further interesting, possibly useful information on both identifying and blocking spam bots has been written by Ralph D. Kloth of Kloth.Net. It is similar to what we have provided above, though goes a few steps further. For example, he includes tricks on identifying the IP addresses used by spam bots, and then using the .htaccess file to block by the environment variable "Remote_Addr" in addition to "User-Agent".
We are cautious about that approach because it is most unlikely that any spammer will be working from a fixed IP address. In all probability, the spammer will use a different IP address each time they crawl your site. None-the-less, the Kloth.Net article does have some very interesting and useful pointers on the topic of identifying and blocking spam bots using your .htaccess file.
Thus far, in Parts 1 and 2 of this article on "Spam Bots - and how to avoid them", the emphasis has been on techniques for webmasters to protect their websites. In Part 3, we take a look at techniques for non-webmaster - ordinary, everyday web-surfing individuals - to prevent their email addresses being found by spam bots.
Continue on to Part 3.....
Return to Part 1
Return to Articles Index
|