Search A-Z index Help
University of Cambridge Home Computing Service
University of Cambridge  >  Computing Service  >  Electronic mail

The Central Email Scanner

Introduction

The majority of email in Cambridge (including email entering, leaving, and within the University) passes through a central relay known as ppsw. This relay runs software that scans email to protect the University against two kinds of undesirable message: spam and viruses. This page contains details of how the scanner works, including which email is scanned and how it is altered. The Computing Service maintains other pages with general information about junk email and advice on protecting your computer from viruses.

The email scanner is only a first line of defence. You should still run a virus scanner on your computer because there are ways of getting infected other than via email. You can get anti-virus software from the Computing Service. Serious users may also benefit from running their own spam filter, since a personalized filter can be more closely tuned to the kinds of email you receive. See FAQ E32 for information about the SpamBayes Outlook Plugin which can do this. The Mac OS X Mail application comes with this feature built-in.

Anti-spam measures

There are no clear technical criteria for identifying spam. This is partly because no-one can agree on exactly what spam is. One way of characterizing it is the phrase "Unsolicited Bulk Email": this highlights the aspect of it that involves abuse of the network infrastructure, and there are technical measures that tackle it at that level. Another way of characterizing it is "Anything I Don't Like": this highlights the frequently offensive content of spam, and again it can be tackled using these features.

Because of this lack of clarity the email scanner uses a mixture of techniques to reduce the amount of spam that users have to deal with:

DNS blacklists

Firstly, we use DNS blacklists to identify the IP addresses of computers on the Internet that we will not accept email from. There are a number of reasons that an IP address may be blacklisted: the computer may be misconfigured in such a way as to make it open to abuse by spammers; the address may be listed by its owner as one that should never send email; or the address may be allocated to an organization that is known to send spam.

There are a number of different DNS blacklists with varying policies about listing IP addresses, some of which are more aggressive than others. When rejecting email, the Computing Service only uses DNS blacklists that have a good reputation for not gratuitously listing legitimate IP addresses. Even so there is the occasional communication problem caused by the blacklists, in which case you can contact <postmaster@cam.ac.uk> for assistance (no messages to that email address are blocked). However note that we do not configure ppsw with special exceptions to the DNS blacklists because that would be a duplication of effort; in the case of an erroneous listing you must deal with the DNS blacklist administrators via the web sites below.

At the moment ppsw uses two DNS blacklists:

Sender blacklist

For a number of years the Computing Service has maintained a blacklist of email addresses and domains. Messages from these addresses are not accepted. This blacklist is also used by some other institutions in the University that have their own email systems.

This blacklist is no longer very effective. It is now common for spammers to forge email so that it appears to come from an innocent third party, or to use addresses for very short periods of time so that by the time an address is blacklisted it is already too late. Therefore we are now discouraging this kind of filtering (including the similar BLOCK filtering option on Hermes).

To suggest changes to the sender blacklist, contact <postmaster@cam.ac.uk> including the full headers of any message that you forward.

SpamAssassin

SpamAssassin is a system that performs a large number of tests on a message to decide if it is spam. These tests look at the content of the message, various technical details in its headers, and query databases on the Internet. Many of the tests identify features of the message that are common in spam and some of them identify non-spam features. Each test has an associated score which is positive for spam and negative for non-spam. The scores of all the tests that succeed are added together to produce an aggregate score for the message as a whole. The higher the score the more likely it is to be spam.

Although SpamAssassin is reasonably effective it cannot identify spam or legitimate email 100% accurately. Therefore ppsw does not reject email based on its SpamAssassin score. Instead, some headers containing SpamAssassin's results are added to the message so that users can configure their own filters according to the kind of email that they receive. See below for details of the scanner's headers. The documentation for users of Hermes and other systems to set up filtering based on the SpamAssassin score is on another page.

The Computing Service only makes basic changes to the SpamAssassin configuration to tailor it to our local needs; for example, we have added some tests to recognize Cambridge email addresses to compensate for the fact that by default SpamAssassin thinks they are spammy. We do not make more extensive changes to the tests because that would be duplicating the work of the SpamAssassin developers and it would make it harder to keep the software up-to-date. For this reason we are not generally interested in individual messages that score unexpectedly high or low and are erroneously classified as spam or not, since there is little we can do about them.

For more information, see the SpamAssassin FAQ. If you receive a legitimate message that was classified as spam, perhaps you set your filtering threshold too low; see also the FAQ. If you receive some spam that was classified as legitimate email, perhaps you set your filtering threshold too high; see also the FAQ. Though it is a chore to have to go through your spam mailbox every few days to delete messages, SpamAssassin isn't perfect so you would risk losing real email if high-scoring messages were deleted unseen; see also the FAQ. Note that only email arriving at ppsw from outside the University is run through SpamAssassin, in order to reduce the scanner's workload. See below for more information about which email is scanned.

Anti-virus measures

Viruses and email

Unlike spam, there are clear technical criteria for identifying viruses, since viruses target computers rather than people. This means that it is possible for us to filter out infected email centrally with less risk of losing legitimate email. The scanner filters email using commercial virus scanning software, and as a further level of protection it also filters attachments based on the name and type of the file they contain. This extra protection helps when there are delays getting a virus database update from the vendor, and it reduces the ways in which malicious email can trick users.

The details of the policy implemented by the virus filter are largely determined by the way the scanner works and by weaknesses in Internet email. The scanner looks at a message after it has been accepted by ppsw, since this gives us better control over the load on the computers and makes them more resistant to attack. This means that we have two possible responses to an infected message: either return it to its sender, or make it safe before delivering it to its recipient. However there is no guarantee in Internet email that the apparent sender of a message really did send it, and email worms in particular frequently forge messages such that returning an infected message to its "sender" would incorrectly tell some innocent third party that they have a virus infection. Therefore the virus filter will alter email to remove viruses before sending the messages on to their recipients, as described below.

The most common and troublesome kind of viruses that the scanner aims to stop are "worms" that target weaknesses in Microsoft Outlook etc. and propagate automatically via email. These worms are never attached to legitimate email so it makes no sense to deliver their messages after disinfection. Therefore the Computing Service maintains a list of known email worms which the virus filter discards without informing either the (forged) sender or the recipient.

Anti-virus policy

  • If a message contains a virus-infected attachment that can be disinfected by the virus scanner then it will be. The recipient will receive everything that was sent, plus a virus warning.
  • If it cannot be disinfected then the attachment will be replaced by some advisory text that explains the problem.
  • Similarly, if the attachment has a dangerous file type or name it will also be replaced by advisory text. Dangerous file types include executable programs. Dangerous file names include those that are too long or which contain too much punctuation or white space.
  • If the message is generated by a known worm on the list maintained by the Computing Service, it will be deleted without informing anyone.

The first two cases above should be rare if you and your correspondents keep your anti-virus software up-to-date, though they may also be caused by a new worm that hasn't yet been put on the delete list. If the message is legitimate (which a human can decide in a way that software cannot) then the recipient should inform the sender that they have a virus problem.

If you want to send a message containing a dangerous file, you can avoid the file type and name restrictions by putting it in a zip file before sending. Note that the virus scanner can look for viruses inside zip files and other types of archive, but the file type and name restrictions only apply to the outer wrapping of the attachment.

See below for more information about the way the filter alters messages. There is another page with more general information about email attachments. If the filtering is causing you problems, please contact the Computing Service help desk, <help-desk@ucs.cam.ac.uk>, in the first instance.

How the scanner alters email

Scanner headers

The email scanner adds some headers to each message that passes through, containing some information about what the scanner found. You can see them by viewing the full headers of the message. If a message is scanned more than once (e.g. because it has been re-sent) then it will have more than one set of scanner headers.

Each of the headers starts X-Cam-. The X- indicates that this is a non-standard header. The -Cam- is to distinguish the Cambridge scanner installation from other email scanners that might work on the same message.

The X-Cam-ScannerInfo: header contains the URL of this web page, so that people can find out the operational details of the scanner without needing to know anything about Cambridge University or the Computing Service.

The X-Cam-AntiVirus: header summarizes the findings of the virus scanner. It may say "Not scanned" if the message was not scanned for viruses (see below); or "Disinfected" if a virus was found and successfully removed leaving the uninfected attachment intact; or "Found to be infected" if a virus was found and the attachment was removed because it could not be disinfected, or if an attachment had a dangerous file name or file type; or it may say "No virus found" if the message passed the virus filter OK.

The X-Cam-SpamDetails: header contains the results of the spam scanner. It will say "Not scanned" if the message comes from within the University; otherwise it will look something like this:

X-Cam-SpamDetails: scanned, SpamAssassin (score=5.2, DEAR_SOMETHING 2.60,
        URGENT_BIZ 0.15, US_DOLLARS 1.54, US_DOLLARS_3 0.85)
The text in the brackets includes the overall score assigned to the message by SpamAssassin, and the list of tests that the message matched with the score for each test.

If the message has a spam score greater than one, a fourth header is added. The X-Cam-SpamScore: header contains a sequence of the letter "s" (for "spam") equal in length to the message's score rounded down to a whole number, e.g. sssss for a score of 5.2. This header is intended to make it easy for users to configure their spam filters.

Bodies and attachments

When the anti-virus filter alters an email it does the following things:

  • It replaces the problematic attachment with some advisory text, and discards the original;
  • It adds a warning to the body of the message, which refers to the replacement attachment;
  • It adds a tag to the Subject: line to mark that the message has been filtered.

We do not keep the original attachment for a number of reasons, as follows. If it contains a virus it is too dangerous to keep. If it had a dangerous file name or file type, the original sender should still have a copy and can re-send it in a zip file so the Computing Service does not need to keep a copy as well. There are also questions about the legality of intercepting messages, i.e. keeping a copy of them, which do not arise when messages are mechanically altered.

The advisory text in the replacement attachment includes a link to a web page which explains more about what the recipient should do about the filtered message. There are a few different versions depending on why the message was filtered: for disinfected viruses, for deleted viruses, and for dangerous file names and file types.

Coverage of the scanner

Various institutions in the University run their own email systems independently of the Computing Service. Many of them "hub" through ppsw (i.e. send and receive email via the central relay), but some do not. The scanner cannot fully protect institutions that are not hubbed; there are also hubbed institutions that have opted out of the scanner. The question of which email will be scanned is therefore not simple to answer, and this just reflects the organizational complexity of Cambridge University.

You can find out if a message has been scanned by viewing its full headers. If the X-Cam-AntiVirus: and X-Cam-SpamDetails: fields appear and do not say "Not scanned" then the message has been scanned for viruses and spam, respectively. In some cases of complicated forwarding or re-sending of a message it may pass through the scanner more than once, in which case more than one set of headers will appear.

In general, all messages sent to domains managed by the Computing Service will be scanned; this includes @cam.ac.uk, @hermes.cam.ac.uk, and @cus.cam.ac.uk. If the message is subsequently forwarded to a non-hubbed or opted-out institution it is still scanned, because the scanner runs before ppsw works out that a message is going to be forwarded in this way. All email leaving the University via ppsw is scanned for viruses regardless of opt-outs, in order to minimize the chance of other organizations catching an infection from Cambridge. Email to or from "system" addresses such as postmaster (at any domain, not just cam.ac.uk) is also not scanned, so that problem reports do not get filtered. Scanning is done on a per-message basis, not a per-address basis, so if a message is sent to both a scanned address and an opted-out address then the stricter policy applies and the message is scanned. In most cases, email arriving at ppsw from inside the University is not scanned for spam, to reduce the scanner's workload. This means that messages from outside forwarded to ppsw via a non-hubbed institution might not have a spam score.

If you have an institution email address such as spqr9@botolph.cam.ac.uk then the decision about whether messages to that address are scanned lies with those responsible for the domain, not with the Computing Service. If you have any questions about a domain's email scanning policy, please direct your questions to postmaster@ that domain (e.g. postmaster@botolph.cam.ac.uk).

If you are a Computer Officer and you want to change the scanning arrangements for your institution's email domain, please contact <mail-scanner-support@ucs.cam.ac.uk>. An institution may "hub" via ppsw or not, and it is easy to switch between these setups. We can arrange for email from non-hubbed institutions to be scanned for spam (the default is not). Hubbed institutions may opt out of the scanner, (although note that the opt-out is limited for the reasons explained above). Similarly, an institution with a managed mail domain may also opt out.