Email Domain Cryptographic Bounce Authentication

Email Domain Cryptographic Bounce Authentication University of Cambridge Computing Service

New Museums Site Pembroke Street Cambridge CB2 3QH ENGLAND +44 797 040 1426 dot@dotat.at http://dotat.at/

Applications MAILSIG How to put a secure tag in the domain part of a bounce address to detect collateral spam and how to detect and prevent replay attacks against tagged addresses $Cambridge: hermes/doc/antiforgery/draft-fanf-email-dcba.xml,v 1.11 2005/03/10 15:00:57 fanf2 Exp $

Blah.

A batch of messages sent by a spammer or mass-mailing virus. In this document we are concerned with messages with a forged bounce address. Bounce messages sent to a victim as a result of an attack. This is also known as "blow-back" and "collateral spam". The return path of the message, used as the destination of bounces such as delivery status reports, vacation messages, etc. The bounce address is the argument of the MAIL FROM command and is placed in the Return-Path: header field when the message is finally delivered. The bounce address modified according to . The domain part of a tagged address. The label used to identify the algorithm defined in by which a tagged address is created and verified. When talking about an attack, the target is the recipient of email sent by the attacker. The original bounce address as specified by the message sender. When talking about an attack, the victim is identified by the forged bounce addresses used by the attacker. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in . Syntax specifications use the ABNF language specified in . Terminals not defined in this document, such as ALPHA, DIGIT, SP, CR, LF, CRLF, are as defined in the "core" syntax in section 6 of or in the message format syntax in . Examples use a common scenario: The domain example.edu accepts incoming email for addresses @example.edu via the host(s) called mx.example.edu. The majority of the email is delivered to a message store called store.example.edu, which users can access via imap.example.edu. An submission service is provided at smtp.example.edu and a general-purpose outgoing email relay at relay.example.edu. A department runs its own email service on dept.example.edu which receives incoming email via mx.example.edu and sends outgoing email via relay.example.edu.

Something about tagging by MSAs and verification by MTAs and/or the stunt DNS server.

This section describes the syntax of tagged addresses. The syntax is generic in that it is independent of the tagging scheme (), which is identified by a field in the tagged address. All the important information in a tagged address is in the domain part; this allows a special DNS server to be used for verification and tracking of tagged addresses, as described in . The untagged address can be recovered from a tagged address using only knowledge of the generic syntax; this simplifies interoperability with with software that misuses the bounce address, as described in .

An untagged address has the following syntax (simplified from ): " Mailbox = Local-part "@" Domain Local-part = Dot-string / Quoted-string Domain = (sub-domain 1*("." sub-domain)) / address-literal sub-domain = Let-dig [Ldh-str] ]]>

A tagged address has the following syntax: " Tagged-address = Local-part "@" Tagged-domain Tagged-domain = Tag "." Tag-suffix Tag-Suffix = Tagging-scheme "." Encoded-local-part "." Tag-marker "." Domain Tag-marker = "a--t" Tagging-scheme = "opaque" / "fixed" / sub-domain Tag = (sub-domain *("." sub-domain)) Encoded-local-part = sub-domain ]]>

The untagged address MUST NOT have an address-literal as its domain part. An MSA that tags addresses can enforce this restriction by rejecting any messages which has a bounce address that violates this requirement. The untagged address SHOULD NOT be source-routed (as indicated by the optional A-d-l part of the syntax). An MSA that tags addresses can enforce this restriction by stripping off the source route. The Encoded-local-part MUST fit in a domain label, that is it MUST be 63 characters or less. This is slighly less than the maximum interoperable size of 64 characters guaranteed by , though if the Local-part uses characters that require encoding its length is restricted further. The Tagged-domain MUST not be more than 255 characters. This means the Domain has a much more restricted length than usual. Work out what the overhead in a Tagged-domain actually is.

Local parts of email addresses have a much less restricted syntax than domain parts. In order to accommodate this, the Local-part of the untagged address much be encoded so that it can be included in the Tagged-domain. The Local-part is transformed into the Encoded-local-part one character at a time. The whole Local-part is used: if the Local-part is a Quoted-string then the quotes are included. If the character is a digit (ASCII 48..55) or a lower-case letter (ASCII 97..122) the encoded form is the same as the original character. If the character is a hyphen (ASCII 45) the encoded form is a double hyphen "--". If the character is an upper-case letter (ASCII 65..90) and local parts are not case sensitive at this domain, the encoded form is the corresponding lower-case character. If local parts are case sensitive at this domain then upper-case letters are encoded as described in the next paragraph. The encoded form of other characters is a hyphen followed by the value of the character followed by another hyphen. For example, if the input character is "." then its encoded form is "-2e-".

When recovering the untagged address, the syntax MUST be fully parsed. It is not sufficient to simply spot ".a--t." and strip off everything between the start of the Tagged-domain and the Tag-marker. As well as thoroughly checking the syntax, the Encoded-local-part MUST be valid, for example no non-hex digits in the encoding of "other" characters. If the whole of the tagged address is available (not just the tagged domain) the decoded Encoded-local-part MUST match its Local-part.

This section discusses alternative approaches to defining tagging schemes, before defining standard "opaque", "fixed", and "notag" schemes. A tagging scheme defines how to create and verify the "Tag-data" part of the generic syntax specified in . All implementations that create and verify tagged addresses MUST include at least the standard schemes.

There are two general approaches to defining tagging schemes: public and private. Public schemes expose as much information to the recipient as possible, in order to allow the recipient to co-operate in the process of validating the tagged address. The advantage of this co-operation is that it scales well in the event of an attack: the victim doesn't have to have enough verification capacity to handle the full load of the attack. However there are a few disadvantages. Recipients cannot be expected to implement tagged address verification, especially in the short term. Replay attacks are harder to detect and stop, because there is no overview of tagged address usage. Other mechanisms for foiling replay attacks are needed, such as including a digest of the message data, but these make the scheme more complicated without improving its ability to deal with backscatter. Private schemes do not make any effort to benefit from possible co-operation between sites that know about the scheme. The main advantage is simplicity: a site can implement the scheme without concern for compatibility with other sites. Because CBA puts the tag in the domain part, a site has a good ability to detect and thwart replay attacks, as discussed in and . However this requires victims to have enough capacity to handle the verification load caused by an attack. defines a standard private tagging scheme. describes situations in which general-purpose tagging schemes cannot be used because of interoperability problems with the recipient. These problem recipients assume a given sender always uses the same bounce address, so defines a standard fixed tagging scheme.

The schemes below require that email addresses are canonicalized before use, so that a consistent string can be used as input to various cruptographic algorithms. The domain part of the canonicalized address MUST be all lower-case. The local part of the canonicalized address MUST not be quoted if it conforms to the Dot-string syntax defined in . If local parts are not case-sensitive then the canonicalized address MUST have an entirely lower-case local part.

Tag-data created according to the "opaque" scheme contain a timestamp and a nonce. The timestamp is used to expire old addresses to protect them against replay attacks. The nonce ensures that each tag is unique. This data is encrypted in such a way that successful decryption also validates the rest of the tagged address.

An opaque tag's plaintext has the following syntax: Define the opaque tag plaintext syntax. An encryption key is created based on a site-wide private master key and the untagged address. The HMAC algorithm is used with H (the hash function) being SHA1 , with K being the master key, and text being the untagged address canonicalized according to . The plaintext is then encrypted with AES using the encryption key to produce the ciphertext. The Tagged-address is then created according to , with Tagging-scheme set to "opaque", and Tag-data set to the ciphertext encoded using base32 and with trailing "=" characters omitted. How long is the resulting opaque tag? We should define the plaintext syntax so that there are no trailing "=" characters to omit. A tagged address using the opaque scheme is verified as follows. The Tag-data part MUST be the correct length; if it is not all upper-case it is converted to upper case. The untagged address is obtained according to . The encryption key is created as described above. The tag is decoded from base32 then decrypted. The resulting plaintext MUST conform to the syntax defined above. If this process is completed successfully, then the address MAY be considered to be valid. If the timestamp is older than one month, or if a replay attack has been detected (see ), the address SHOULD be considered invalid.

Tag-data created according to the "fixed" scheme consist of two sub-domain parts: the left part is a lookup key derived from the untagged address and the recipient address; and the right part is an arbitrary string. In general, each problem recipient will have a number of addresses, for example the posting, subscription, and unsubscription addresses of a mailing list, and because of case-insensitivity. This implies that the recipient address MUST be mapped to a canonical form. This canonical form is not used for sending email to the recipient so does not have to be a valid email address; for example it could be the pattern that matches all the recipient's addresses. The untagged address canonicalized according to , a "," character, and the canonical form of the recipient address are concatenated to produce an address pair. The address pair is hashed using SHA1 , and the most significant 40 bits of the hash are encoded with base32 to produce the 8 character lookup key. The arbitrary string is created once for each address pair and stored in a database. It SHOULD consist of at least How many characters in a fixed tag? random characters chosen from the digits (ASCII 48..55) and the upper case letters (ASCII 65..90). The database entry MUST also contain enough additional information to validate the Tag-suffix part of the domain. When creating a fixed scheme tagged address, the lookup key is derived as described above and used to retrieve the arbitrary string from the database. The Tag-data is the concatenation of the arbitrary string, a "." character, and the lookup key. To verify the address, the lookup key and arbitrary string are extracted from the address and converted to upper case. The lookup key is used to retrieve a string from the database; the lookup MUST succeed, and MUST result in a string that matches the upper case version of the arbitrary string from the address. The additional information MUST validate the Tag-suffix.

The "notag" scheme, as its name suggests, does not include any Tag-data. It is purely for use as part of the recipient address verification protocol described in , and MUST NOT be used to tag the bounce address of legitimate email.

Blah.

This section describes three ways of configuring the DNS for use with CBA. The first, simplest way provides roughly equivalent protection to . The more complicated arrangements provide better protection against replay attacks and fake taged addresses.

CBA can be set up using a standard DNS server to provide a basic level of protection against backscatter.

The idea here is that when a recipient receives a message with an untagged bounce address, they should create a notag version and look up its MX in the DNS. NXDOMAIN implies the domain does not use tagged addresses; a successful MX result implies that the domain knows about tagged addresses but either that user doesn't use them, or the site is using a simple DNS setup (); a NODATA result implies that the address is invalid - to see why, consider the effect of installing the TXT record as described in . Is this clever distinction between NXDOMAIN and NODATA too much abuse of the DNS? Should I go back to a design based on SRV records?

Blah.

To be filled in.

Moser's attack

Something about allocation of tag schemes.

&rfc2104; &rfc2119; &rfc2234; &rfc2476; &rfc2821; &rfc2822; &rfc3174; &rfc3548; Advanced Encryption Standard (AES) National Institute of Standards and Technology &rfc3552; Bounce Address Tag Validation (BATV) Taughannock Networks Brandenburg InternetWorking Openwave University of Cambridge

Markus Kuhn (mgk25@cam.ac.uk) suggested the encryption technique described in . Ian Jackson (ijackson@chiark.greenend.org.uk) pointed out that it is possible to rate-limit verification requests. Roger Moser (Roger.Moser@rama.pamho.net) pointed out the attack described in .