Internet-Draft T. Finch University of Cambridge September 22, 2004 Bounce address domain tag verification and tracking Abstract How to put a secure tag in the domain part of a bounce address to detect collateral spam and how to detect and prevent replay attacks against tagged addresses Document revision $Cambridge: hermes/doc/antiforgery/draft-fanf-mass-bad.xml,v 1.8 2004/09/20 22:38:10 fanf2 Exp $ Finch [Page 1] Internet-Draft Bounce address domain tags September 2004 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4. Generic tagged address syntax . . . . . . . . . . . . . . . 4 4.1 Restrictions on addresses . . . . . . . . . . . . . . . . 5 4.2 Encoding the Local-part . . . . . . . . . . . . . . . . . 6 4.3 Recovering the untagged address . . . . . . . . . . . . . 6 5. Tagging schemes . . . . . . . . . . . . . . . . . . . . . . 6 5.1 Alternative tagging schemes . . . . . . . . . . . . . . . 7 5.2 Address canonicalization . . . . . . . . . . . . . . . . . 7 5.3 The opaque tagging scheme . . . . . . . . . . . . . . . . 8 5.4 The fixed tagging scheme . . . . . . . . . . . . . . . . . 8 5.5 The notag tagging scheme . . . . . . . . . . . . . . . . . 9 6. MTA operation with BADT . . . . . . . . . . . . . . . . . . 9 7. Configuring the DNS for BADT . . . . . . . . . . . . . . . . 9 7.1 Simple DNS configuration . . . . . . . . . . . . . . . . . 10 7.2 Intermediate DNS configuration . . . . . . . . . . . . . . 10 7.3 Recommended DNS configuration . . . . . . . . . . . . . . 10 8. Verification of addresses by recipients . . . . . . . . . . 10 9. Interoperability Considerations . . . . . . . . . . . . . . 10 9.1 The use of null return paths . . . . . . . . . . . . . . . 10 9.2 Using the bounce address to identify the sender . . . . . 10 9.3 Internationalization . . . . . . . . . . . . . . . . . . . 10 10. Security Considerations . . . . . . . . . . . . . . . . . . 10 10.1 Deliberate address expiry . . . . . . . . . . . . . . . 10 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . 10 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 12.1 Normative References . . . . . . . . . . . . . . . . . . . 11 12.2 Informative References . . . . . . . . . . . . . . . . . . 11 Author's Address . . . . . . . . . . . . . . . . . . . . . . 12 A. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . 12 Finch [Page 2] Internet-Draft Bounce address domain tags September 2004 1. Introduction Blah. 2. Terminology Attack: A batch of messages sent by a spammer or mass-mailing virus. In this document we are concerned with messages with a forged bounce address. Backscatter: Bounce messages sent to a victim as a result of an attack. This is also known as "blow-back" and "collateral spam". Bounce address: The return path of the message, used as the destination of bounces such as delivery status reports, vacation messages, etc. The bounce address is the argument of the [RFC2821] MAIL FROM command and is placed in the [RFC2822] Return-Path: header field when the message is finally delivered. Tagged address: The bounce address modified according to Section 4. Tagged domain: The domain part of a tagged address. Tagging scheme: The label used to identify the algorithm defined in Section 5 by which a tagged address is created and verified. Target: When talking about an attack, the target is the recipient of email sent by the attacker. Untagged address: The original bounce address as specified by the message sender. Victim: When talking about an attack, the victim is identified by the forged bounce addresses used by the attacker. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. Syntax specifications use the ABNF language specified in [RFC2234]. Terminals not defined in this document, such as ALPHA, DIGIT, SP, CR, LF, CRLF, are as defined in the "core" syntax in section 6 of [RFC2234] or in the message format syntax in [RFC2822]. Examples use a common scenario: The domain example.edu accepts incoming email for addresses @example.edu via the host(s) called mx.example.edu. The majority of the email is delivered to a Finch [Page 3] Internet-Draft Bounce address domain tags September 2004 message store called store.example.edu, which users can access via imap.example.edu. An [RFC2476] submission service is provided at smtp.example.edu and a general-purpose outgoing email relay at relay.example.edu. A department runs its own email service on dept.example.edu which receives incoming email via mx.example.edu and sends outgoing email via relay.example.edu. 3. Model Something about tagging by MSAs and verification by MTAs and/or the stunt DNS server. 4. Generic tagged address syntax This section describes the syntax of tagged addresses. The syntax is generic in that it is independent of the tagging scheme (Section 5), which is identified by a field in the tagged address. All the important information in a tagged address is in the domain part; this allows a special DNS server to be used for verification and tracking of tagged addresses, as described in Section 7. The untagged address can be recovered from a tagged address using only knowledge of the generic syntax; this simplifies interoperability with with software that misuses the bounce address, as described in Section 9. An untagged address has the following syntax (simplified from [RFC2821]): Reverse-path = Path Path = "<" [ A-d-l ":" ] Mailbox ">" Mailbox = Local-part "@" Domain Local-part = Dot-string / Quoted-string Domain = (sub-domain 1*("." sub-domain)) / address-literal sub-domain = Let-dig [Ldh-str] Finch [Page 4] Internet-Draft Bounce address domain tags September 2004 A tagged address has the following syntax: Reverse-path =/ Tagged-path Tagged-path = "<" Tagged-address ">" Tagged-address = Local-part "@" Tagged-domain Tagged-domain = Tag "." Tag-suffix Tag-Suffix = Encoded-local-part "." Tag-marker "." Domain Tag-marker = "b--a" Tag = Tag-data "." Tagging-scheme / "notag" Tagging-scheme = "opaque" / "fixed" / sub-domain Tag = (sub-domain *("." sub-domain)) Encoded-local-part = sub-domain 4.1 Restrictions on addresses The untagged address MUST NOT have an address-literal as its domain part. An MSA that tags addresses can enforce this restriction by rejecting any messages which has a bounce address that violates this requirement. The untagged address SHOULD NOT be source-routed (as indicated by the optional A-d-l part of the syntax). An MSA that tags addresses can enforce this restriction by stripping off the source route. The Encoded-local-part MUST fit in a domain label, that is it MUST be 63 characters or less. This is slighly less than the maximum interoperable size of 64 characters guaranteed by [RFC2821], though if the Local-part uses characters that require encoding its length is restricted further. The Tagged-domain MUST not be more than 255 characters. This means the Domain has a much more restricted length than usual. [[NOTE4: Work out what the overhead in a Tagged-domain actually is.]] Finch [Page 5] Internet-Draft Bounce address domain tags September 2004 4.2 Encoding the Local-part Local parts of email addresses have a much less restricted syntax than domain parts. In order to accommodate this, the Local-part of the untagged address much be encoded so that it can be included in the Tagged-domain. The Local-part is transformed into the Encoded-local-part one character at a time. The whole Local-part is used: if the Local-part is a Quoted-string then the quotes are included. If the character is a digit (ASCII 48..55) or a lower-case letter (ASCII 97..122) the encoded form is the same as the original character. If the character is a hyphen (ASCII 45) the encoded form is a double hyphen "--". If the character is an upper-case letter (ASCII 65..90) and local parts are not case sensitive at this domain, the encoded form is the corresponding lower-case character. If local parts are case sensitive at this domain then upper-case letters are encoded as described in the next paragraph. The encoded form of other characters is a hyphen followed by the value of the character followed by another hyphen. For example, if the input character is "." then its encoded form is "-2e-". 4.3 Recovering the untagged address When recovering the untagged address, the syntax MUST be fully parsed. It is not sufficient to simply spot ".b--a." and strip off everything between the start of the Tagged-domain and the Tag-marker. As well as thoroughly checking the syntax, the Encoded-local-part MUST be valid, for example no non-hex digits in the encoding of "other" characters. If the whole of the tagged address is available (not just the tagged domain) the decoded Encoded-local-part MUST match its Local-part. 5. Tagging schemes This section discusses alternative approaches to defining tagging schemes, before defining standard "opaque", "fixed", and "notag" schemes. A tagging scheme defines how to create and verify the "Tag-data" part of the generic syntax specified in Section 4. All implementations that create and verify tagged addresses MUST include at least the standard schemes. Finch [Page 6] Internet-Draft Bounce address domain tags September 2004 5.1 Alternative tagging schemes There are two general approaches to defining tagging schemes: public and private. Public schemes expose as much information to the recipient as possible, in order to allow the recipient to co-operate in the process of validating the tagged address. The advantage of this co-operation is that it scales well in the event of an attack: the victim doesn't have to have enough verification capacity to handle the full load of the attack. However there are a few disadvantages. Recipients cannot be expected to implement tagged address verification. Replay attacks are harder to detect and stop, because there is no overview of tagged address usage. Other mechanisms for foiling replay attacks are needed, such as including a digest of the message data, but these make the scheme more complicated without improving its ability to deal with backscatter. Private schemes do not make any effort to benefit from possible co-operation between sites that know about the scheme. The main advantage is simplicity: a site can implement the scheme without concern for compatibility with other sites. Because BADT puts the tag in the domain part, a site has a good ability to detect and thwart replay attacks, as discussed in Section 3 and Section 7. However this requires victims to have enough capacity to handle the verification load caused by an attack. Section 5.3 defines a standard private tagging scheme. Section 9 describes situations in which general-purpose tagging schemes cannot be used because of interoperability problems with the recipient. These problem recipients assume a given sender always uses the same bounce address, so Section 5.4 defines a standard fixed tagging scheme. 5.2 Address canonicalization The schemes below require that email addresses are canonicalized before use, so that a consistent string can be used as input to various cruptographic algorithms. The domain part of the canonicalized address MUST be all lower-case. The local part of the canonicalized address MUST not be quoted if it conforms to the Dot-string syntax defined in [RFC2821]. If local parts are not case-sensitive then the canonicalized address MUST have an entirely lower-case local part. Finch [Page 7] Internet-Draft Bounce address domain tags September 2004 5.3 The opaque tagging scheme Tag-data created according to the "opaque" scheme contain a timestamp and a nonce. The timestamp is used to expire old addresses to protect them against replay attacks. The nonce ensures that each tag is unique. This data is encrypted in such a way that successful decryption also validates the rest of the tagged address. An opaque tag's plaintext has the following syntax: XXX [[NOTE6: Define the opaque tag plaintext syntax.]] An encryption key is created based on a site-wide private master key and the untagged address. The HMAC [RFC2104] algorithm is used with H (the hash function) being SHA1 [RFC3174], with K being the master key, and text being the untagged address canonicalized according to Section 5.2. The plaintext is then encrypted with AES [FIPS197] using the encryption key to produce the ciphertext. The Tagged-address is then created according to Section 4, with Tagging-scheme set to "opaque", and Tag-data set to the ciphertext encoded using base32 [RFC3548] and with trailing "=" characters omitted. [[NOTE7: How long is the resulting opaque tag? We should define the plaintext syntax so that there are no trailing "=" characters to omit.]] A tagged address using the opaque scheme is verified as follows. The Tag-data part MUST be the correct length; if it is not all upper-case it is converted to upper case. The untagged address is obtained according to Section 4.3. The encryption key is created as described above. The tag is decoded from base32 then decrypted. The resulting plaintext MUST conform to the syntax defined above. If this process is completed successfully, then the address MAY be considered to be valid. If the timestamp is older than one month, or if a replay attack has been detected (see Section 7), the address SHOULD be considered invalid. 5.4 The fixed tagging scheme Tag-data created according to the "fixed" scheme consist of two sub-domain parts: the left part is a lookup key derived from the untagged address and the recipient address; and the right part is an arbitrary string. Finch [Page 8] Internet-Draft Bounce address domain tags September 2004 In general, each problem recipient will have a number of addresses, for example the posting, subscription, and unsubscription addresses of a mailing list, and because of case-insensitivity. This implies that the recipient address MUST be mapped to a canonical form. This canonical form is not used for sending email to the recipient so does not have to be a valid email address; for example it could be the pattern that matches all the recipient's addresses. The untagged address canonicalized according to Section 5.2, a "," character, and the canonical form of the recipient address are concatenated to produce an address pair. The address pair is hashed using SHA1 [RFC3174], and the most significant 40 bits of the hash are encoded with base32 [RFC3548] to produce the 8 character lookup key. The arbitrary string is created once for each address pair and stored in a database. It SHOULD consist of at least [[NOTE8: How many characters in a fixed tag?]] random characters chosen from the digits (ASCII 48..55) and the upper case letters (ASCII 65..90). The database entry MUST also contain enough additional information to validate the Tag-suffix part of the domain. When creating a fixed scheme tagged address, the lookup key is derived as described above and used to retrieve the arbitrary string from the database. The Tag-data is the concatenation of the arbitrary string, a "." character, and the lookup key. To verify the address, the lookup key and arbitrary string are extracted from the address and converted to upper case. The lookup key is used to retrieve a string from the database; the lookup MUST succeed, and MUST result in a string that matches the upper case version of the arbitrary string from the address. The additional information MUST validate the Tag-suffix. 5.5 The notag tagging scheme The "notag" scheme, as its name suggests, does not include any Tag-data. It is purely for use as part of the recipient address verification protocol described in Section 8, and MUST NOT be used to tag the bounce address of legitimate email. 6. MTA operation with BADT Blah. 7. Configuring the DNS for BADT This section describes three ways of configuring the DNS for use with BADT. The first, simplest way provides roughly equivalent protection Finch [Page 9] Internet-Draft Bounce address domain tags September 2004 to [BATV]. The more complicated arrangements provide better protection against replay attacks and fake taged addresses. 7.1 Simple DNS configuration BADT can be set up using a standard DNS server to provide a basic level of protection against backscatter. 7.2 Intermediate DNS configuration 7.3 Recommended DNS configuration 8. Verification of addresses by recipients The idea here is that when a recipient receives a message with an untagged bounce address, they should create a notag version and look up its MX in the DNS. NXDOMAIN implies the domain does not use tagged addresses; a successful MX result implies that the domain knows about tagged addresses but either that user doesn't use them, or the site is using a simple DNS setup (Section 7.1); a NODATA result implies that the address is invalid - to see why, consider the effect of installing the TXT record as described in Section 7.2. [[NOTE9: Is this clever distinction between NXDOMAIN and NODATA too much abuse of the DNS? Should I go back to a design based on SRV records?]] 9. Interoperability Considerations Blah. 9.1 The use of null return paths 9.2 Using the bounce address to identify the sender 9.3 Internationalization 10. Security Considerations To be filled in. 10.1 Deliberate address expiry Moser's attack 11. IANA Considerations Something about allocation of tag schemes. Finch [Page 10] Internet-Draft Bounce address domain tags September 2004 12. References 12.1 Normative References [FIPS197] National Institute of Standards and Technology, "Advanced Encryption Standard (AES)", FIPS Pub. 197, November 2001. [RFC2104] Krawczyk, H., Bellare, M. and R. Canetti, "HMAC: Keyed-Hashing for Message Authentication", RFC 2104, February 1997. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2234] Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", RFC 2234, November 1997. [RFC2476] Gellens, R. and J. Klensin, "Message Submission", RFC 2476, December 1998. [RFC2821] Klensin, J., "Simple Mail Transfer Protocol", RFC 2821, April 2001. [RFC2822] Resnick, P., "Internet Message Format", RFC 2822, April 2001. [RFC3174] Eastlake, D. and P. Jones, "US Secure Hash Algorithm 1 (SHA1)", RFC 3174, September 2001. [RFC3548] Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", RFC 3548, July 2003. 12.2 Informative References [BATV] Levine, J., Crocker, D., Silberman, S. and T. Finch, "Bounce Address Tag Validation (BATV)", draft-levine-mass-batv-00 (work in progress), September 2004. [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC Text on Security Considerations", BCP 72, RFC 3552, July 2003. Finch [Page 11] Internet-Draft Bounce address domain tags September 2004 Author's Address Tony Finch University of Cambridge Computing Service New Museums Site Pembroke Street Cambridge CB2 3QH ENGLAND Phone: +44 797 040 1426 EMail: dot@dotat.at URI: http://dotat.at/ Appendix A. Acknowledgments Markus Kuhn (mgk25@cam.ac.uk) suggested the encryption technique described in Section 5.3. Ian Jackson (ijackson@chiark.greenend.org.uk) pointed out that it is possible to rate-limit verification requests. Roger Moser (Roger.Moser@rama.pamho.net) pointed out the attack described in Section 10.1. Finch [Page 12]