The previous newsletter included an item about our email filtering
project, which you can see at
<http://www.cam.ac.uk/cs/newsletter/2003/nl215/mail.html#1>
This project is now nearing completion.

One of the more difficult aspects has been choosing a virus filtering
policy that improves the security of email users in the University
without being obtrusive or losing legitimate email. In particular, the
most common email-borne viruses -- automatically propagating worms
that target weaknesses in Microsoft Outlook etc. -- forge email so
that it appears to come from someone other than the owner of the
infected machine. For this reason we have decided not to bounce email
that triggers the virus scanner -- if only to reduce the number of
questions we get from innocent third parties like those caused by
"collateral spam" from the existing filter on Hermes.

Instead the virus scanner will alter email to remove viruses, and in
extreme cases (when it is certain that the email is generated by a
worm and therefore cannot be legitimate) the email will be deleted.
This is safe because there are clear technical criteria for deciding
if an email carries a particular virus, unlike spam where you need to
understand the recipient's interests and correspondents to know if an
email is unsolicited and unwanted. Although the virus scanner is
automatically updated from the vendor, the list of worms that get
deleted is maintained by the CS based on practical experience.

Email will be modified for a number of reasons: If an email contains a
virus-infected attachment that can be disinfected by the virus scanner
then it will be. If it cannot be disinfected then it will be replaced
by some advisory text that explains the problem, and it'll be up
to the recipient to inform the sender that they need to fix their
computer; we hope that this situation will be rare for real email, and
most likely caused by a new worm that hasn't been put in the delete
list yet. As a further level of protection, attachments will be
replaced by advisory text if they have a dangerous filename, either one
designed to fool users or their software, or one that indicates the
attachment is executable (and therefore dangerous); in this case users
should zip the file before sending it -- this is similar to the
existing filter on Hermes.

The general aim of the virus filtering policy is to improve the
current system with better virus detection, and extend it to email
systems other than Hermes, and to be less annoying.

The other way the filtering system will alter email is to add headers
that describe the filter's spam analysis. This is based on a number of
tests that detect spammy or non-spammy features of email, and add
positively or negatively (respectively) to the email's score. The list
of tests can be seen at <http://spamassassin.org/tests.html> although
a few tests have been added to deal with local circumstances (such as
a CRSID test to compensate for FROM_ENDS_IN_NUMS). If you view an
email from the scanner with full headers you will see some extra
information like this:

X-Cam-SpamDetails: not spam, SpamAssassin (score=-39.7, required 10,
        EMAIL_ATTRIBUTION, IN_REP_TO, KNOWN_MAILING_LIST, PGP_SIGNATURE_2,
        QUOTED_EMAIL_TEXT, REFERENCES, REPLY_WITH_QUOTES, USER_AGENT_MUTT)

or this:

X-Cam-SpamDetails: spam, SpamAssassin (score=28.3, required 10,
        BASE64_ENC_TEXT, DATE_IN_FUTURE_24_48, FORGED_YAHOO_RCVD, HOT_NASTY,
        HTML_60_70, HTML_FONT_COLOR_GRAY, HTML_MESSAGE, HTML_SHOUTING4,
        HTTP_USERNAME_USED, MIME_HTML_ONLY, MIME_MISSING_BOUNDARY,
        MSGID_OUTLOOK_TIME, NORMAL_HTTP_TO_IP, RCVD_IN_NJABL,
        RCVD_IN_OSIRUSOFT_COM, RCVD_IN_UNCONFIRMED_DSBL, SUBJ_HAS_SPACES,
        SUBJ_HAS_UNIQ_ID, TO_MALFORMED, TRACKER_ID, USERPASS,
        X_PRIORITY_HIGH)
X-Cam-SpamScore: ssssssssssssssssssssssssssss

There are some unimporant details: the X indicates that the header is
non-standard; the Cam is to avoid clashes with other sites' filtering
systems; SpamAssassin is the software we are using. The important
thing is the score, which is the sum of the results of the tests that
succeeded, as listed in the lines after the Details header. (The tests
are intricate and subject to change, so if you are interested you
should read the SpamAssassin source code.) The larger the score the
spammier the email; the scanner has a required score of 10, which is
where the "spam" and "not spam" annotations come from, but this is
only a suggestion. If an email has a positive score then an extra
header of esses is added to make custom filtering easier.

Most users will be able to ignore the preceding paragraphs and
configure their spam filtering with the Hermes webmail or ssh/telnet
menu system, which will be updated to provide a simple way to set it
up. By default nothing will happen until you set up
filtering. When you set it up you will have to choose a threshold
score which determines how strict you want to be about spam: a high
score will mean that spam is more likely to get through, whereas a low
score means that you're more likely to mis-classify legitimate email.
It's a balancing act, which is why email classified as spam will just
be delivered to a different mailbox instead of your inbox, so you can
check it for mis-classified email once a week (say).

Users on systems other than Hermes will have different ways of setting
up junk email filtering. For example, an Exim filter clause suitable
for use in a .forward file on CUS might be

	# Exim filter
	if $h_X-Cam-SpamScore contains ssssssss then
		save mail/spam
		seen finish
	endif

where the number of esses indicates your threshold score, as described
above: the more esses the more spam will get through, and fewer esses
is more likely to mis-file legitimate email.


$Cambridge: hermes/doc/newsletter/2003-04-filtering.txt,v 1.2 2003/04/04 16:33:26 fanf2 Exp $