Protecting against email forgery in Cambridge ============================================= $Cambridge: hermes/doc/antiforgery/cam.txt,v 1.10 2004/07/22 10:09:32 fanf2 Exp $ Summary: ------- This document describes a proposed mechanism for protecting against the forgery of Cambridge email addresses and the resulting collateral spam. Its security does not require co-operation from the recipient of the message to work. Forged messages will be detected and rejected by ppsw in the course of usual email relaying. Other systems will be able to detect forgery by using call-back verification. Senders must be authenticated when they submit a message, otherwise ppsw will consider it to be forged and reject it, which in rare situations may lead to a silently lost message. Therefore the mechanism must be opt-in and well-explained to users. Users who do not opt in will not be affected, and can continue to work as they do now, the only change being that they will not be able to forge email that appears to come from user who has opted in. Because of the requirement for authentication, the mechanism can only be offered to Hermes users submitting via an authenticated connection to smtp.hermes.cam.ac.uk, or webmail.hermes.cam.ac.uk, or Pine on hermes.cam.ac.uk. It will work well with the managed mail domain system, and probably with the mailing lists system (though effort in the latter direction is probably not worth it at this point in time). Recipients which assume that a given person will always send email with the same envelope sender address will have problems. This includes a few mailing list systems and filtering based on the envelope sender address. The proposed mechanism cannot be directly extended to other email systems in the University, including CUS. However since it can be implemented unilaterally the lack of direct support from us doesn't prevent other systems from doing something similar. Background: ---------- Since mid-2003 there has been a gaining momentum behind a co-ordinated anti-spam effort within the Internet standards community. This started off with the Anti-Spam Research Group in March 2003. One of the proposed schemes that quickly gained popularity is based on the idea of "designated senders", where a domain advertises the machines it uses for outgoing email in the DNS in a simple counterpart to the use of MX records for incoming email. The idea is that the system delivering a message can be checked against these records in a similar way to a DNS blacklist check to detect if the message is forged. The leading specification for such a scheme was SPF, which was fairly stable by the end of 2003 and collecting users fast, most prominently AOL. SPF uses TXT records in the DNS containing a specification written in a fairly complicated macro language to describe the outgoing MTAs' addresses. At the start of 2004 the ASRG spun off an engineering group called "MTA Authorization Records in the DNS" to finalize a designated sender specification for publication as an RFC. At about this point Microsoft and Yahoo! announced their own anti-forgery schemes called "Caller-ID for email" and "DomainKeys" respectively. Caller-ID is another designated sender scheme which like SPF uses TXT records, although the language it uses is XML. After some wrangling in the MARID group it was decided to merge the SPF and Caller-ID efforts into a single standard known as Sender-ID, which uses the SPF syntax and the Caller-ID semantics. This is going to be the final output of the MARID work in time for the IETF meeting in August. (I discuss DomainKeys further in the Cryptography section.) The common problem shared by all designated sender schemes is that they do not take into account email aliasing/forwarding, such as provided by the /etc/aliases and .forward files on Unix or the Sieve filter redirect command. Consider a message that has been sent from site A (which advertises Sender-ID records for its domain in the DNS) to an address at domain B. If the MX for domain B checks the Sender-ID it will pass, since the message came from site A. The problem occurs when the recipient at domain B forwards all email to a final destination at domain C; if domain C's MX then checks the Sender-ID of the message it will fail, because the message claims to come from domain A but actually comes from site B. A number of workarounds for this problem have been suggested, depending on the semantics of the designated sender scheme being fixed. SPF checks the envelope sender address, so they worked out a system for rewriting sender addresses during forwarding which embeds the original sender into a domain B address so that any bounces could be forwarded back to domain A. Unless clever security mechanisms are used this risks turning the forwarding site into an open relay. Caller-ID and Sender-ID extract a "purported responsible address" from the message header, and propose that Resent- headers should be added to a message when it is forwarded. This use of Resent- headers disagrees with the standard RFC 822 semantics, and does not work well if the message is sent to multiple recipients at domain B who forward to domain C. In both cases this requires a third party (domain B) to change their setup in order to preserve working communication between site A and site C. Other alternatives are to maintain a whitelist of "trusted forwarding" sites, which significantly weakens the whole system; or for each person who sets up forwarding to keep a small whitelist of their own, which is a lot of effort and probably too complicated for most users. This is all extremely unsatisfactory for sites like Cambridge that have a lot of forwarding addresses: about 7% of Hermes users forward their email to a non-cam.ac.uk address; it's unknown how many of them receive email forwarded from elsewhere. A further problem is that these plans do not directly address the problem of collateral spam. Both the purported sending site and the recipient site must co-operate for the forgery to be detected, and even then the collateral spam is only eliminated in the case of direct-to-MX forgery: if the message is ever accepted by an MTA (e.g. an outgoing smarthost) a collateral spam bounce will be created which will appear OK from the designated sender point of view. Thus designated sender schemes are not only troublesome, they aren't even very good at what they claim to do. I have been participating in the working groups to point out these shortcomings and others; it remains to be seen whether the IETF will approve the specification - I hope not, or at least only with a whacking great caveat saying "don't use this". In any case we will not be publishing MARID records (we have a better system!), and the only checking of others' MARID records we do (if we do it at all) will be to provide hints for other checks. Fortunately, in the course of the discussions on the SPF, MARID, and ASRG lists, other possible anti-forgery and anti-collateral-spam mechanisms have been suggested. One idea is to send out all your messages with a cryptographically unforgeable cookie in them, and check that any bounce messages you receive contain the cookie. However bounces and auto-replies might not contain any of the original message: vacation notices are a common example. In fact the only thing that is sure to be retained is the original envelope sender address, which is where this proposal puts the cookie. Another way of coming up with this idea is by consideration of the security mechanisms required for the sender-rewriting SPF workaround. The main weakness of this proposal is a result of the bad behaviour of incompetently written auto-responders (usually anti-virus software) that does not construct proper bounce messages, e.g. without an empty envelope sender address and/or sent to an address found in the header instead of the original message's envelope sender. Fortunately these messages can be filtered quite effectively with the usual anti-spam techniques. One open question is if any legitimate bounce messages are similarly malformed; it may be possible to implement a weaker form of the forgery checking to see if we can spot such cases. Other problems are caused by external sites that rely on a constant envelope sender address to identify a correspondent. These include greylisting systems (which delay email from unrecognized senders), some mailing list systems (e.g. ezmlm and listserv), some challenge-response anti-spam systems (which require senders to reply to a challenge in order to get onto a whitelist), and auto-filing email using Sieve or Exim filters. I discuss envelope-dependent recipients further below. Mechanism: --------- Firstly a new table will be created listing which users have opted in to forgery protection. Users will be able to change their settings using the Webmail "manage" page. This table will be distributed out to ppsw in the usual way. It is used by both major parts of the mechanism. A user that has opted in MUST ensure that they submit all messages "from" their @hermes address via Hermes. Any messages "from" non-Hermes addresses that forward to Hermes MUST also be submitted via Hermes. The most common examples of such addresses are @cam and managed mail domain addresses, but it also includes off-site addresses that forward to Hermes. Of course, if the user never sends email "from" these addresses they don't have to submit anything via Hermes. They may still receive @hermes email, e.g. by forwarding it on to their active email account. Forgery protection is also useful for this kind of unused Hermes account. Message submission: When a user authenticates to smtp.hermes.cam.ac.uk, they are looked up in the opt-in table. If they are not present (or if they have not authenticated) then things proceed as in the message relay situation (described below). For opted-in users, the envelope sender address on their message is replaced with a cryptographically signed version of their @hermes address. The address identifies the user and the signature guarantees that they were properly authenticated when the message was sent. The signature is just an extension to the local part of the email address, so sites that do not know about signed addresses will pass it through without a fuss. This rewriting forces all bounce messages to be directed to Hermes. When an opted-in user sends a message via Pine on Hermes or Webmail, the same rewriting occurs. At the moment I'm undecided whether signing will happen on Hermes itself or whether Hermes will authenticate to ppsw on behalf of the user. In both cases it will be easier if the last remains of old Hermes have been removed, because it avoids an ambiguity between newly-submitted messages (which need signing) and forwarded messages (which must not be signed). On old Hermes these functions are implemented on the same system, but new Hermes separates them. Message relay: When ppsw receives a message via any of its identities (mx.cam.ac.uk, ppsw.cam.ac.uk, smtp.hermes.cam.ac.uk) it verifies the envelope sender address, or in the case of bounces the recipient address (which was originally the envelope sender of the message that bounced). Verification requires routing the address to work out where messages to it must be sent next, so as a result of routing ppsw will find out if the address ultimately refers to a Hermes account, which user it belongs to, and (by a table look-up) whether they have opted in to forgery protection. If the user has not opted in, then nothing special happens: the address is OK and will be accepted. If the user has opted in, then in the two situations above (envelope sender of a normal message, or recipient of a bounce) the address must also have a valid signature in order to be accepted, otherwise it is rejected as being a forgery. Recipient addresses of normal messages are not signed so don't require this check. Signature checking comes into effect in a few situations: If an opted-in user receives a bounce message the signature checking verifies that the bounce is legitimate and is not collateral spam or forged-bounce spam. If a legitimate message passes through ppsw more than once, e.g. as a result of users' forwarding set-ups, then the signature will be verified each time. For example a message passes through ppsw twice if it is sent to a Hermes user who forwards their email elsewhere. If someone attempts to pass a forged message through ppsw claiming to be from an opted-in user, it will be rejected. This might be a local user performing a message submission, in which case they will get an error message from their MUA; this is what will usually happen to a misconfigured opted-in user. If the forged message is coming from another MTA, that MTA will try to bounce the forged message to ppsw (because that's where the forged envelope sender says to send bounces). This bounce is collateral spam, so it also gets rejected. Double bounces are usually simply lost. If an opted-in user is misconfigured in such a way that they submit a message other than via ppsw, and if that message goes through ppsw, it will appear to be forged and will be lost. If a remote site or a department/college email server performs call-back verification of remote email addresses, this will appear to ppsw as if it is being sent a bounce, but no message data is actually sent. When it is being sent the envelope of the "bounce" ppsw will verify it as usual and communicate the result (accept or reject) back to the site performing the call-back. In this way other systems can detect forgeries claiming to be from Hermes users. Managed mail domains: Many college and departmental email addresses, as well as the @cam domain, are handled as alias addresses on ppsw. As a result of this, ppsw can work out which user an address belongs to with the usual routing/verification process. Therefore if a user opts in to the forgery protection system then all their aliases get equivalent protection. When ppsw is verifying a managed mail domain alias that expands to multiple addresses, it does not check any further than that alias. This means that it is not a problem if the various people that the alias expands to have different forgery protection settings. However the downside is that multiple aliases are not protected from forgery. This could be fixed by an enhancement to the managed mail domain system that allows domain managers to turn on forgery protection for multiple aliases. It will be slightly different from personal forgery protection since there will be no way of sending email "from" such an address unless the sender is also opted in to forgery protection. This enhancement can wait until the basic system is working. Mailing lists: If all the people authorized to send messages to a list have turned on forgery protection, then the list is secure even in simple moderated mode. However the list management addresses are not protected against forgery, so the list managers are vulnerable to collateral spam and it is possible to forge email that appears to come "from" the list (albeit not going to all the list members). Although it is fairly easy to provide limited forgery protection for some list addresses in a blanket manner across the system, it would not fix the problems described in the previous paragraph and it would conflict with the way some users use list addresses as role addresses. On balance it is probably best to leave this problem for solution as part of the replacement mailing list system project. Further discussion: ------------------ Testing mode: There is an intermediate level between no forgery protection and full forgery protection which will be useful in some situations. In testing mode the envelope sender address is rewritten as described above, but instead of rejecting messages that fail the forgery check, a warning header is added. This implies that it does not allow third parties to detect forgery using call-back verification. The advantages of testing mode are that it allows us to answer questions about the behaviour of other sites and the effects of forgery protection on interoperability. It may be popular for users who prefer more conservative junk email filtering. The warning headers may be useful as a global feature to raise awareness of the availability of a secure alternative. However even testing mode is probably too disruptive to be turned on globally, at least at first, because of the problems discussed in the next section. Envelope-dependent recipients: As outlined above, this kind of forgery protection is unfriendly to sites which assume that a given user always uses the same envelope sender address. The mechanism is designed not to fail totally when sending to a greylisting site. The envelope sender address is replaced once when the message is submitted, so it is constant for a given message. The greylisting site will defer the first delivery of every message, but on the second attempt the sender will be recognized and the message will be accepted. A plausible but wrong implementation is to add the signature when the message leaves ppsw; if you do this the signature will be different for each delivery of the message, so the greylisting site will never recognize the sender and no email will get through. However there remains the problem of recipients that require the envelope sender address to be persistent for longer than a single message. We will have to maintain a list of known problematic recipient addresses, and arrange that messages to these addresses have a fixed signature (per user/recipient pair). Users who opt in will have to update their mailing list subscriptions to allow for their different envelope sender address. We can also make things easier by ensuring that filters set up on Hermes which are based on the envelope sender are compatible with the signed address format. Satellite systems: There will be demand for forgery protection for users of systems other than Hermes, most particularly CUS. There are two implementation strategies: unilateral, or with assistance from ppsw. The latter is only appropriate for CUS because it requires sharing of cryptographic secrets. However I favour the unilateral approach for CUS, since it will provide a reference implementation of forgery protection on a standard Unix system which can be used by departments and other sites. This will require that smtp.cus.cam.ac.uk is withdrawn along with imap.cus and pop.cus, or moved away from ppsw. The main difference from the Hermes implementation is that CUS has a different method of message submission, using direct invocation of Exim instead of an SMTP AUTH connection. This should result in just some straightforward differences of detail in the configuration. Of more concern is that CUS is much more open to users using the service in unusual ways, a common example being message submission by making an SMTP connection to localhost instead of invoking Exim. Are there any examples of this in the basic system software? Are there any examples in the supported software? How can a user know that it is safe to turn on forgery protection? I think it will be useful to implement forgery protection on CUS, because users will want it and because it will provide broader experience of implementation issues. However it should probably wait until the Hermes version has been done and we have some plausible answers to the above questions. Testing mode will probably be helpful for that. Standardization: A counterpart to this project is to write it up as an RFC so that others can implement in an interoperable manner. I am collaborating with Dave Crocker and others on this project. He has the advantage of having a long track record in Internet email standards, so any work I do with him is more likely to gain general acceptance than me going ahead by myself. At the moment I envisage a number of documents: Best current practice for email address verification at SMTP time. The aim of this is to encourage SMTP servers to reject messages as early as possible, rather than to accept and bounce (causing collateral spam). Callout verification of remote email addresses. This is an extension to the previous document, but kept separate because it is somewhat controversial. Call-back verification doesn't scale up well enough to be widely deployed across the public Internet. However call-forward verification is useful within an organization (e.g. for ppsw to be able to verify addresses at departmental and college email servers). It would be useful to have them properly specified. A standard format for signed sender addresses. Although signed addresses are fairly effective when implemented unilaterally, it is better to have a standard format for them. This allows a site to be flexible about choice of software and be sure that addresses signed by one package can be verified by another. The standard will also be extensible to support multiple cryptographic algorithms, so that in the future we can use public key signatures and verification with DNS look-ups instead of call-backs. The latter specification will also help with making recipient software more aware of signed envelope sender addresses, which (in the long term) will help to reduce the envelope-dependent recipient problem. As a related but not directly relevant activity I have also been helping Dave to write a description of the Internet email architecture. Cryptography: The first version of this scheme will use a simple shared secret message authentication code to sign the addresses. This implies that only ppsw will be able to sign and verify addresses. MAC signatures have the great advantage of being compact and are already supported by Exim. In addition to the MAC itself, a timestamp will be added to the address as a protection against replay attacks. The recipient of a forgery-protected message can re-use the envelope sender address to forge messages "from" the sender. The sender is generally safe if they don't send messages to miscreants, but when designing for security it is best not to be optimistic. Another way that signed addresses can escape is if they find their way on to a web archive of messages. Mailing list archives are generally safe because list systems replace the original envelope sender address with their own. Bug tracking systems also have archives, and unfortunately these do often include the envelope sender address. The worst possibility is a virus on a user's machine trawling their local email store for addresses; even if it doesn't know about signed addresses it's likely to get lucky with some random combinations of addresses. If an address is obtained by spammers, we could note it on a blacklist of compromised addresses to limit the scope of the damage. The timestamp causes addresses to expire automatically which limits the number of them we need to store on the blacklist. In the future we would like to be able to use public key signatures on the addresses, so that the public key can be published and recipient sites can verify an address without using a call-back. This is similar to Yahoo!s DomainKeys idea, although they put the signature in the header of the message and as already explained that is not such a good place to put it. Their signature also covers some of the message header and content which makes the signature vulnerable to common kinds of message alteration, including the X-Cam- headers added by our email scanner. However one of the big disadvantages of public key signatures is that they are much bigger than MACs: at least twice the size by raw bit count, and they usually come with verbose framing too. This is, perhaps, a good reason for putting the signature in the message header. I hope that with some help from cryptographers we will be able to find a signature algorithm whose result will fit inside an email address. This is important to make the system more obviously competitive with SPF and Sender-ID. Message Headers: The current proposal only protects the message envelope against forgery, but this is the least visible part of the message from the point of view of the average user. Protecting the header is the motivation behind the Yahoo! DomainKeys system and the Microsoft Caller-ID "purported responsible address" algorithm. The signed sender address idea cannot simply be extended to secure Sender: and From: addresses in the header by signing them too, because it would be ugly and much more vulnerable to replay attacks. Instead the plan is to create additional Verify-Sender: and Verify-From: headers that parallel Sender: and From: and contain signed versions of the addresses. These signed addresses can be used for verification in the usual way, but they contain a marker which prevents them from being re-used as an envelope sender address. There will also be an overall Verify-Message: header which contains a special signed "address" which, instead of being related to any particular normal address, is simply a container for a signature which would cover parts of the message header to make it difficult to create a forgery by picking and choosing parts of other messages. This kind of message header protection has not been fully thought through yet, and is a direct competitor to DomainKeys, so it is likely to change a lot in the course of specification or be dropped entirely. It also requires co-operation from the writers of MUA software to be fully useful: there's talk of enhancing user interfaces to flag forged messages. Therefore it'll be a long time before this part of the work is complete if ever. - ends -