University of Minnesota
Statistics
info@stat.umn.edu
612-625-8046


School of Statistics

Controling Junk/Spam mail

There are two levels of controlling junk mail at the Statistics systems. One is a general filter program called 'spamassassin' that will add the string of characters {Spam?} to the subjects of mail messages that it considers to be bad (spam) mail. The 'spamassassin' programs consults a wide database that gets updated as new spam mail is produced. But since 'spamassassin' can not apply your own criteria about what is bad and what is good you can also use a second filtering level program to apply your own criteria.

If you read your mail with a web browser the browser may already be doing some of this second filtering which, eventually, you can help to improve by filing your bad mail into the "Junk" box.

If you are familiar with some basic unix commands you can apply further control to your web browser as well as to the old traditional mail agents such as 'mutt' or 'elm'. By doing that you will have better control on how to handle your mail by using the 'bogofilter' program to do the filtering. To use 'bogofilter', however, you have to "train" the program about what bad and good mail are according with your criteria.

Filtering mail when using web mail readers

If you read mail with a browser such as 'thunderbird' make sure the option to "move messages marked as Junk to the Junk folder" is selected. Initially, mark one message as such to create the Junk folder. The Junk folder (as other 'mozzilla' folders) is created by default in the HOME directory. This (the HOME directory) should be the directory to be used for 'procmailrc' to work with your files. (See bellow)

Also, make sure to understand the difference between "Trash" and "Junk": you "Trash" all message that you do not want to keep any longer and "Junk" messages that are bad {Spam?} and that you do not want to receive others of the same type. The browser learns form your criteria what bad mail is by looking at the "Junk" folder.

Additional controls for any type of mail agents

In addition to the basic controls applied by most web mail agents the following applies to both, web mail and traditional unix mail programs such as 'elm' or 'mutt'. This gives you a faster and more direct way to identify your bad (***SPAM*** labeled) mail before it is delivered to your mailbox.

  1. Create forward file with instructions to execute the procmail program each time new mail for you is received. In your home directory create a file named '.forward', with with one line (include the quotes):
    "|exec /usr/bin/procmail"
  2. Create .procmailrc file with instructions for the 'procmail' program:
    MAILDIR=$HOME
    					LOGFILE=$MAILDIR/procmail.log
    					
    					:0:
    					* ^Subject.*\*\*SPAM\*\*\*
    					Junk

With this the 'procmail' program will file all mail marked as spam by the mail scanner (spamassassin) in the Junk folder in your home directory. -- The 'spamassassin' program uses a general criteria to identify junk mail by consulting a database that is updated often. If the mail seems to be junk mail it adds the characters ***SPAM*** to the subject of the message. Those characters are the ones that 'procmail' is instructed to look for in your .procmailrc file.

3 - Make sure to inspect now and then the contains of the Junk box until you are sure that the set up works well or if you suspect that some mail that you should be receiving is mistakenly marked as spam.

4 - Should you desire one more level of filtering you can also use the 'bogofilter' program. For that you will need to learn how to use some unix commands to maintain your own database of good vs bad mail so 'bogofilter' can consult and to learn your own criteria.

Further filtering and using 'bogoofilter'

For any type of mail agent that you may be using, if you are familiar with some basic unix commands as well as aware of where your mailboxes are or could be, and you can exercise some degree of control on then. In principle, it is recommend for those mailboxes to be in a subdirectory and not in your $HOME directory.

First, decide what directory you will be using to sort and store your mail and make sure that that directory is available in your home directory. In the example we will use My_mail. We will sort the mail in a way such that one mailbox will contain what we consider "good" mail, another for mail generally known as "spam" by the spamassassin program, a third one for the mail considered "spam" by 'bogofilter' and finally your regular mailbox where messages that do not follow any of the established criteria (and now likely be good ones) will continue to be placed.

Why two mailboxes for "spam"? Because while 'spamassassin' will recognize bad mail almost without error (up to the point that some people will not even read it ever and will remove it as it arrives marked as such) it is possible for 'bogofilter' to make some mistakes until it is fine tuned with the criteria that you will supply it with.

Why two mailboxes for "good" mail? Because until you can fully trust the 'bogofilter' criteria this separation of mail will help you to build that criteria. And now, finally, some words on how to fine tune 'bogofilter' according with your criteria: Eventually you will have two, three or four mailboxes to "feed" to 'bogofilter'. When you do that you have to let 'bogofilter' know if the mail you are providing it with is good good bad. The -s tells to register the text as spam and the -N option to undo a previous registration of the same mail as spam. The -n tells 'bogofilter' to register the mail in the file as good mail and the -S option to undo a previous registration as spam mail. So, once you had properly filed mail in the different mailboxes you can reinforce the 'bogofilter' criteria by using the commands:

cat Junk | bogofilter -Ns
cat friends | bogofilter -Sn

1 - Create .forward file with a line as:
"|exec /usr/bin/procmail"
This .forward will have instructions to execute the 'procmail' program each time new mail for you is received.

2 - Create .promailrc file with instructions for the 'procmail' program such as:

VERBOSE=yes
PATH=/bin:/usr/bin:/usr/bin
MAILDIR=$HOME/My_mail
LOGFILE=$MAILDIR/procmail_log #recommended
#
#
# File what 'spamassassin' considers junk mail in the Junk folder
:0: * ^Subject.*\{Spam\?\}
Junk
# Note: replace the 'Junk' mailbox with /dev/null if you
# fully trust that 'spamassassin' will not rate your good
# mail as bad and you will never care to see it.
#
#
# Filter mail through bogofilter, tagging it as spam and
# updating the word lists
:0fw
| bogofilter -u -e -p
# If bogofilter failed, return the mail to the queue,
# the MTA will retry to deliver it later
# ( 75 is the value for EX_TEMPFAIL in /usr/include/sysexits.h)
:0e
{ EXITCODE=75 HOST }
#
#
# File mail according with level of bogosity:
:0:
* ^X-Bogosity: Yes, tests=bogofilter
spam-bogofilter
:0:
* ^X-Bogosity: Spam, tests=bogofilter
spam-bogofilter
#
#
# Since most of the mail may come from stat.umn.edu
# or umn.edu one could have an "almost" friendly folder
:0:
* ^From.*@stat\.umn\.edu
friends
:0: * ^From.*\.umn.\edu
friends
#
#
# One can add more criteria for filing in the friends
# folder such as if it is from a particular sender # such as (my_friend@organization.org),
# the subject has a particular code (secret-word),
# or it is distributed by a given mailing list # such as (tetex@dbs.uni-hannover.de).
:0:
* ^From.my_friend@organization/.org
friends
:0: * ^Subject.*secret-word*
friends
:0:
* ^List-Post.*tetex@dbs.uni-hannover.de
friends

So... How many mailboxes do I need to check if I use this .promailrc file? Four mailboxes:

1 - My_mail/friends
2 - Your regular mailbox
(now it will be much cleaner and easy to sort)
3 - spam-bogofilter
(to keep tuning up 'bogofilter')
4 - spam

Seams complicated? Well, depending of the amount of bad mail that you receive this may make work easier than to manually filter on your own the bad and the good mail from your regular mailbox.