Protecting against email forgery in Cambridge
=============================================

$Cambridge: hermes/doc/antiforgery/cam.txt,v 1.10 2004/07/22 10:09:32 fanf2 Exp $

Summary:
-------

This document describes a proposed mechanism for protecting against
the forgery of Cambridge email addresses and the resulting collateral
spam.

Its security does not require co-operation from the recipient of
the message to work. Forged messages will be detected and rejected
by ppsw in the course of usual email relaying. Other systems will
be able to detect forgery by using call-back verification.

Senders must be authenticated when they submit a message, otherwise
ppsw will consider it to be forged and reject it, which in rare
situations may lead to a silently lost message. Therefore the
mechanism must be opt-in and well-explained to users.

Users who do not opt in will not be affected, and can continue to work
as they do now, the only change being that they will not be able to
forge email that appears to come from user who has opted in.

Because of the requirement for authentication, the mechanism can only
be offered to Hermes users submitting via an authenticated connection
to smtp.hermes.cam.ac.uk, or webmail.hermes.cam.ac.uk, or Pine on
hermes.cam.ac.uk.

It will work well with the managed mail domain system, and probably
with the mailing lists system (though effort in the latter direction
is probably not worth it at this point in time).

Recipients which assume that a given person will always send email
with the same envelope sender address will have problems. This
includes a few mailing list systems and filtering based on the
envelope sender address.

The proposed mechanism cannot be directly extended to other email
systems in the University, including CUS. However since it can be
implemented unilaterally the lack of direct support from us doesn't
prevent other systems from doing something similar.


Background:
----------

Since mid-2003 there has been a gaining momentum behind a co-ordinated
anti-spam effort within the Internet standards community. This started
off with the Anti-Spam Research Group in March 2003. One of the
proposed schemes that quickly gained popularity is based on the idea
of "designated senders", where a domain advertises the machines it
uses for outgoing email in the DNS in a simple counterpart to the use
of MX records for incoming email. The idea is that the system
delivering a message can be checked against these records in a similar
way to a DNS blacklist check to detect if the message is forged. The
leading specification for such a scheme was SPF, which was fairly
stable by the end of 2003 and collecting users fast, most prominently
AOL. SPF uses TXT records in the DNS containing a specification
written in a fairly complicated macro language to describe the
outgoing MTAs' addresses.

At the start of 2004 the ASRG spun off an engineering group called
"MTA Authorization Records in the DNS" to finalize a designated sender
specification for publication as an RFC. At about this point Microsoft
and Yahoo! announced their own anti-forgery schemes called "Caller-ID
for email" and "DomainKeys" respectively. Caller-ID is another
designated sender scheme which like SPF uses TXT records, although the
language it uses is XML. After some wrangling in the MARID group it
was decided to merge the SPF and Caller-ID efforts into a single
standard known as Sender-ID, which uses the SPF syntax and the
Caller-ID semantics. This is going to be the final output of the MARID
work in time for the IETF meeting in August. (I discuss DomainKeys
further in the Cryptography section.)

The common problem shared by all designated sender schemes is that
they do not take into account email aliasing/forwarding, such as
provided by the /etc/aliases and .forward files on Unix or the Sieve
filter redirect command. Consider a message that has been sent from
site A (which advertises Sender-ID records for its domain in the DNS)
to an address at domain B. If the MX for domain B checks the Sender-ID
it will pass, since the message came from site A. The problem occurs
when the recipient at domain B forwards all email to a final
destination at domain C; if domain C's MX then checks the Sender-ID of
the message it will fail, because the message claims to come from
domain A but actually comes from site B.

A number of workarounds for this problem have been suggested,
depending on the semantics of the designated sender scheme being
fixed. SPF checks the envelope sender address, so they worked out a
system for rewriting sender addresses during forwarding which embeds
the original sender into a domain B address so that any bounces could
be forwarded back to domain A. Unless clever security mechanisms are
used this risks turning the forwarding site into an open relay.
Caller-ID and Sender-ID extract a "purported responsible address" from
the message header, and propose that Resent- headers should be added
to a message when it is forwarded. This use of Resent- headers
disagrees with the standard RFC 822 semantics, and does not work well
if the message is sent to multiple recipients at domain B who forward
to domain C.

In both cases this requires a third party (domain B) to change their
setup in order to preserve working communication between site A and
site C. Other alternatives are to maintain a whitelist of "trusted
forwarding" sites, which significantly weakens the whole system; or
for each person who sets up forwarding to keep a small whitelist of
their own, which is a lot of effort and probably too complicated for
most users. This is all extremely unsatisfactory for sites like
Cambridge that have a lot of forwarding addresses: about 7% of Hermes
users forward their email to a non-cam.ac.uk address; it's unknown how
many of them receive email forwarded from elsewhere.

A further problem is that these plans do not directly address the
problem of collateral spam. Both the purported sending site and the
recipient site must co-operate for the forgery to be detected, and
even then the collateral spam is only eliminated in the case of
direct-to-MX forgery: if the message is ever accepted by an MTA (e.g.
an outgoing smarthost) a collateral spam bounce will be created which
will appear OK from the designated sender point of view. Thus
designated sender schemes are not only troublesome, they aren't even
very good at what they claim to do.

I have been participating in the working groups to point out these
shortcomings and others; it remains to be seen whether the IETF will
approve the specification - I hope not, or at least only with a
whacking great caveat saying "don't use this". In any case we will not
be publishing MARID records (we have a better system!), and the only
checking of others' MARID records we do (if we do it at all) will be
to provide hints for other checks.

Fortunately, in the course of the discussions on the SPF, MARID, and
ASRG lists, other possible anti-forgery and anti-collateral-spam
mechanisms have been suggested. One idea is to send out all your
messages with a cryptographically unforgeable cookie in them, and
check that any bounce messages you receive contain the cookie. However
bounces and auto-replies might not contain any of the original
message: vacation notices are a common example. In fact the only thing
that is sure to be retained is the original envelope sender address,
which is where this proposal puts the cookie. Another way of coming up
with this idea is by consideration of the security mechanisms required
for the sender-rewriting SPF workaround.

The main weakness of this proposal is a result of the bad behaviour of
incompetently written auto-responders (usually anti-virus software)
that does not construct proper bounce messages, e.g. without an empty
envelope sender address and/or sent to an address found in the header
instead of the original message's envelope sender. Fortunately these
messages can be filtered quite effectively with the usual anti-spam
techniques. One open question is if any legitimate bounce messages are
similarly malformed; it may be possible to implement a weaker form of
the forgery checking to see if we can spot such cases.

Other problems are caused by external sites that rely on a constant
envelope sender address to identify a correspondent. These include
greylisting systems (which delay email from unrecognized senders),
some mailing list systems (e.g. ezmlm and listserv), some
challenge-response anti-spam systems (which require senders to reply
to a challenge in order to get onto a whitelist), and auto-filing
email using Sieve or Exim filters.  I discuss envelope-dependent
recipients further below.


Mechanism:
---------

Firstly a new table will be created listing which users have opted in
to forgery protection. Users will be able to change their settings
using the Webmail "manage" page. This table will be distributed out to
ppsw in the usual way. It is used by both major parts of the
mechanism.

A user that has opted in MUST ensure that they submit all messages
"from" their @hermes address via Hermes. Any messages "from"
non-Hermes addresses that forward to Hermes MUST also be submitted via
Hermes. The most common examples of such addresses are @cam and
managed mail domain addresses, but it also includes off-site addresses
that forward to Hermes.

Of course, if the user never sends email "from" these addresses they
don't have to submit anything via Hermes. They may still receive
@hermes email, e.g. by forwarding it on to their active email account.
Forgery protection is also useful for this kind of unused Hermes
account.

Message submission:

When a user authenticates to smtp.hermes.cam.ac.uk, they are looked up
in the opt-in table. If they are not present (or if they have not
authenticated) then things proceed as in the message relay situation
(described below).

For opted-in users, the envelope sender address on their message is
replaced with a cryptographically signed version of their @hermes
address. The address identifies the user and the signature guarantees
that they were properly authenticated when the message was sent. The
signature is just an extension to the local part of the email address,
so sites that do not know about signed addresses will pass it through
without a fuss. This rewriting forces all bounce messages to be
directed to Hermes.

When an opted-in user sends a message via Pine on Hermes or Webmail,
the same rewriting occurs. At the moment I'm undecided whether signing
will happen on Hermes itself or whether Hermes will authenticate to
ppsw on behalf of the user. In both cases it will be easier if the
last remains of old Hermes have been removed, because it avoids an
ambiguity between newly-submitted messages (which need signing) and
forwarded messages (which must not be signed). On old Hermes these
functions are implemented on the same system, but new Hermes separates
them.

Message relay:

When ppsw receives a message via any of its identities (mx.cam.ac.uk,
ppsw.cam.ac.uk, smtp.hermes.cam.ac.uk) it verifies the envelope sender
address, or in the case of bounces the recipient address (which was
originally the envelope sender of the message that bounced).
Verification requires routing the address to work out where messages
to it must be sent next, so as a result of routing ppsw will find out
if the address ultimately refers to a Hermes account, which user it
belongs to, and (by a table look-up) whether they have opted in to
forgery protection.

If the user has not opted in, then nothing special happens: the
address is OK and will be accepted.

If the user has opted in, then in the two situations above (envelope
sender of a normal message, or recipient of a bounce) the address must
also have a valid signature in order to be accepted, otherwise it is
rejected as being a forgery. Recipient addresses of normal messages
are not signed so don't require this check.

Signature checking comes into effect in a few situations:

If an opted-in user receives a bounce message the signature checking
verifies that the bounce is legitimate and is not collateral spam or
forged-bounce spam.

If a legitimate message passes through ppsw more than once, e.g. as a
result of users' forwarding set-ups, then the signature will be
verified each time. For example a message passes through ppsw twice if
it is sent to a Hermes user who forwards their email elsewhere.

If someone attempts to pass a forged message through ppsw claiming to
be from an opted-in user, it will be rejected. This might be a local
user performing a message submission, in which case they will get an
error message from their MUA; this is what will usually happen to a
misconfigured opted-in user.

If the forged message is coming from another MTA, that MTA will try to
bounce the forged message to ppsw (because that's where the forged
envelope sender says to send bounces). This bounce is collateral spam,
so it also gets rejected. Double bounces are usually simply lost. If
an opted-in user is misconfigured in such a way that they submit a
message other than via ppsw, and if that message goes through ppsw, it
will appear to be forged and will be lost.

If a remote site or a department/college email server performs call-back
verification of remote email addresses, this will appear to ppsw as if
it is being sent a bounce, but no message data is actually sent. When
it is being sent the envelope of the "bounce" ppsw will verify it as
usual and communicate the result (accept or reject) back to the site
performing the call-back. In this way other systems can detect forgeries
claiming to be from Hermes users.

Managed mail domains:

Many college and departmental email addresses, as well as the @cam
domain, are handled as alias addresses on ppsw. As a result of this,
ppsw can work out which user an address belongs to with the usual
routing/verification process. Therefore if a user opts in to the
forgery protection system then all their aliases get equivalent
protection.

When ppsw is verifying a managed mail domain alias that expands to
multiple addresses, it does not check any further than that alias.
This means that it is not a problem if the various people that the
alias expands to have different forgery protection settings. However
the downside is that multiple aliases are not protected from forgery.

This could be fixed by an enhancement to the managed mail domain
system that allows domain managers to turn on forgery protection for
multiple aliases. It will be slightly different from personal forgery
protection since there will be no way of sending email "from" such an
address unless the sender is also opted in to forgery protection. This
enhancement can wait until the basic system is working.

Mailing lists:

If all the people authorized to send messages to a list have turned on
forgery protection, then the list is secure even in simple moderated
mode. However the list management addresses are not protected against
forgery, so the list managers are vulnerable to collateral spam and it
is possible to forge email that appears to come "from" the list
(albeit not going to all the list members).

Although it is fairly easy to provide limited forgery protection for
some list addresses in a blanket manner across the system, it would
not fix the problems described in the previous paragraph and it would
conflict with the way some users use list addresses as role addresses.

On balance it is probably best to leave this problem for solution as
part of the replacement mailing list system project.


Further discussion:
------------------

Testing mode:

There is an intermediate level between no forgery protection and full
forgery protection which will be useful in some situations. In testing
mode the envelope sender address is rewritten as described above, but
instead of rejecting messages that fail the forgery check, a warning
header is added. This implies that it does not allow third parties to
detect forgery using call-back verification.

The advantages of testing mode are that it allows us to answer
questions about the behaviour of other sites and the effects of
forgery protection on interoperability. It may be popular for users
who prefer more conservative junk email filtering. The warning
headers may be useful as a global feature to raise awareness of the
availability of a secure alternative. However even testing mode is
probably too disruptive to be turned on globally, at least at first,
because of the problems discussed in the next section.

Envelope-dependent recipients:

As outlined above, this kind of forgery protection is unfriendly to
sites which assume that a given user always uses the same envelope
sender address.

The mechanism is designed not to fail totally when sending to a
greylisting site. The envelope sender address is replaced once when
the message is submitted, so it is constant for a given message. The
greylisting site will defer the first delivery of every message, but
on the second attempt the sender will be recognized and the message
will be accepted. A plausible but wrong implementation is to add the
signature when the message leaves ppsw; if you do this the signature
will be different for each delivery of the message, so the greylisting
site will never recognize the sender and no email will get through.

However there remains the problem of recipients that require the
envelope sender address to be persistent for longer than a single
message. We will have to maintain a list of known problematic
recipient addresses, and arrange that messages to these addresses have
a fixed signature (per user/recipient pair). Users who opt in will have
to update their mailing list subscriptions to allow for their different
envelope sender address. We can also make things easier by ensuring
that filters set up on Hermes which are based on the envelope sender
are compatible with the signed address format.

Satellite systems:

There will be demand for forgery protection for users of systems other
than Hermes, most particularly CUS. There are two implementation
strategies: unilateral, or with assistance from ppsw. The latter is
only appropriate for CUS because it requires sharing of cryptographic
secrets. However I favour the unilateral approach for CUS, since it
will provide a reference implementation of forgery protection on a
standard Unix system which can be used by departments and other sites.
This will require that smtp.cus.cam.ac.uk is withdrawn along with
imap.cus and pop.cus, or moved away from ppsw.

The main difference from the Hermes implementation is that CUS has a
different method of message submission, using direct invocation of
Exim instead of an SMTP AUTH connection. This should result in just
some straightforward differences of detail in the configuration.

Of more concern is that CUS is much more open to users using the
service in unusual ways, a common example being message submission by
making an SMTP connection to localhost instead of invoking Exim. Are
there any examples of this in the basic system software? Are there any
examples in the supported software? How can a user know that it is
safe to turn on forgery protection?

I think it will be useful to implement forgery protection on CUS,
because users will want it and because it will provide broader
experience of implementation issues. However it should probably wait
until the Hermes version has been done and we have some plausible
answers to the above questions. Testing mode will probably be helpful
for that.

Standardization:

A counterpart to this project is to write it up as an RFC so that
others can implement in an interoperable manner. I am collaborating
with Dave Crocker and others on this project. He has the advantage
of having a long track record in Internet email standards, so any
work I do with him is more likely to gain general acceptance than
me going ahead by myself.

At the moment I envisage a number of documents:

Best current practice for email address verification at SMTP time. The
aim of this is to encourage SMTP servers to reject messages as early
as possible, rather than to accept and bounce (causing collateral
spam).

Callout verification of remote email addresses. This is an extension
to the previous document, but kept separate because it is somewhat
controversial. Call-back verification doesn't scale up well enough to
be widely deployed across the public Internet. However call-forward
verification is useful within an organization (e.g. for ppsw to be
able to verify addresses at departmental and college email servers).
It would be useful to have them properly specified.

A standard format for signed sender addresses. Although signed
addresses are fairly effective when implemented unilaterally, it is
better to have a standard format for them. This allows a site to be
flexible about choice of software and be sure that addresses signed by
one package can be verified by another. The standard will also be
extensible to support multiple cryptographic algorithms, so that in
the future we can use public key signatures and verification with DNS
look-ups instead of call-backs.

The latter specification will also help with making recipient
software more aware of signed envelope sender addresses, which (in
the long term) will help to reduce the envelope-dependent recipient
problem.

As a related but not directly relevant activity I have also been
helping Dave to write a description of the Internet email
architecture.

Cryptography:

The first version of this scheme will use a simple shared secret
message authentication code to sign the addresses. This implies that
only ppsw will be able to sign and verify addresses. MAC signatures
have the great advantage of being compact and are already supported by
Exim.

In addition to the MAC itself, a timestamp will be added to the
address as a protection against replay attacks. The recipient of a
forgery-protected message can re-use the envelope sender address to
forge messages "from" the sender. The sender is generally safe if they
don't send messages to miscreants, but when designing for security it
is best not to be optimistic.

Another way that signed addresses can escape is if they find their way
on to a web archive of messages. Mailing list archives are generally
safe because list systems replace the original envelope sender address
with their own. Bug tracking systems also have archives, and
unfortunately these do often include the envelope sender address. The
worst possibility is a virus on a user's machine trawling their local
email store for addresses; even if it doesn't know about signed
addresses it's likely to get lucky with some random combinations of
addresses.

If an address is obtained by spammers, we could note it on a blacklist
of compromised addresses to limit the scope of the damage. The
timestamp causes addresses to expire automatically which limits the
number of them we need to store on the blacklist.

In the future we would like to be able to use public key signatures on
the addresses, so that the public key can be published and recipient
sites can verify an address without using a call-back. This is similar
to Yahoo!s DomainKeys idea, although they put the signature in the
header of the message and as already explained that is not such a good
place to put it. Their signature also covers some of the message
header and content which makes the signature vulnerable to common
kinds of message alteration, including the X-Cam- headers added by our
email scanner.

However one of the big disadvantages of public key signatures is that
they are much bigger than MACs: at least twice the size by raw bit
count, and they usually come with verbose framing too. This is,
perhaps, a good reason for putting the signature in the message
header. I hope that with some help from cryptographers we will be able
to find a signature algorithm whose result will fit inside an email
address. This is important to make the system more obviously
competitive with SPF and Sender-ID.

Message Headers:

The current proposal only protects the message envelope against
forgery, but this is the least visible part of the message from the
point of view of the average user. Protecting the header is the
motivation behind the Yahoo! DomainKeys system and the Microsoft
Caller-ID "purported responsible address" algorithm.

The signed sender address idea cannot simply be extended to secure
Sender: and From: addresses in the header by signing them too, because
it would be ugly and much more vulnerable to replay attacks. Instead
the plan is to create additional Verify-Sender: and Verify-From:
headers that parallel Sender: and From: and contain signed versions of
the addresses. These signed addresses can be used for verification in
the usual way, but they contain a marker which prevents them from
being re-used as an envelope sender address.

There will also be an overall Verify-Message: header which contains a
special signed "address" which, instead of being related to any
particular normal address, is simply a container for a signature which
would cover parts of the message header to make it difficult to create
a forgery by picking and choosing parts of other messages.

This kind of message header protection has not been fully thought
through yet, and is a direct competitor to DomainKeys, so it is
likely to change a lot in the course of specification or be dropped
entirely. It also requires co-operation from the writers of MUA
software to be fully useful: there's talk of enhancing user interfaces
to flag forged messages. Therefore it'll be a long time before
this part of the work is complete if ever.

- ends -