<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
 <!ENTITY rfc2104 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2104.xml">
 <!ENTITY rfc2119 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
 <!ENTITY rfc2234 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2234.xml">
 <!ENTITY rfc2476 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2476.xml">
 <!ENTITY rfc2821 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2821.xml">
 <!ENTITY rfc2822 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2822.xml">
 <!ENTITY rfc3174 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3174.xml">
 <!ENTITY rfc3548 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3548.xml">
 <!ENTITY rfc3552 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3552.xml">
]>

<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>

<!-- autobreaks="yes" -->
<?rfc comments="yes" ?> <?rfc inline="yes" ?>
<?rfc compact="yes" ?> <?rfc subcompact="no" ?>
<!-- editing="no" -->
<!-- iprnotified="no" -->
<!-- linkmailto="yes" -->
<?rfc private="Internet-Draft" ?> <?rfc header="Internet-Draft" ?> <!-- footer="" -->
<!-- slides="no" --> <!-- background="" -->
<?rfc sortrefs="yes"?> <?rfc symrefs="yes" ?>
<?rfc strict="yes" ?>
<?rfc toc="yes" ?> <!-- tocompact="yes" --> <?rfc tocdepth="2" ?> <?rfc tocindent="yes" ?>
<!-- topblock="no" -->

<!--

(defun fanf-xml-insert-xref ()
 (interactive)
 (insert "<xref target=\"\"/>"))

(local-set-key "\M-r" 'fanf-xml-insert-xref)

-->

<rfc ipr="full3667"
     docName="draft-fanf-email-dcba">

<!-- === -->
 <front>

  <title abbrev="Email Domain Crypto Bounce Auth">
   Email Domain Cryptographic Bounce Authentication
  </title>

  <author initials="T." surname="Finch" fullname="Tony Finch">
   <organization abbrev="University of Cambridge">
    University of Cambridge Computing Service
   </organization>
   <address>
    <postal>
     <street>New Museums Site</street>
     <street>Pembroke Street</street>
     <city>Cambridge</city>
     <code>CB2 3QH</code>
     <country>ENGLAND</country>
    </postal>
    <phone>+44 797 040 1426</phone>
    <email>dot@dotat.at</email>
    <uri>http://dotat.at/</uri>
   </address>
  </author>

  <date month="September" year="2004"/>

  <area>Applications</area>
  <workgroup>MAILSIG</workgroup>

  <abstract>
   <t>How to put a secure tag in the domain part of a bounce address
   to detect collateral spam and how to detect and prevent replay attacks
   against tagged addresses</t>
  </abstract>

  <note title="Document revision">
   <t>$Cambridge: hermes/doc/antiforgery/draft-fanf-email-dcba.xml,v 1.11 2005/03/10 15:00:57 fanf2 Exp $</t>
  </note>

 </front>

 <middle>

<!-- === -->
  <section title="Introduction">

   <t>Blah.</t>

  </section>

<!-- === -->
  <section title="Terminology">

   <t><list style="hanging">

    <t hangText="Attack:">A batch of messages sent by a spammer or
    mass-mailing virus. In this document we are concerned with
    messages with a forged bounce address.</t>

    <t hangText="Backscatter:">Bounce messages sent to a victim as a
    result of an attack. This is also known as "blow-back" and
    "collateral spam".</t>

    <t hangText="Bounce address:">The return path of the message, used
    as the destination of bounces such as delivery status reports,
    vacation messages, etc. The bounce address is the argument of the
    <xref target="RFC2821"/> MAIL FROM command and is placed in the
    <xref target="RFC2822"/> Return-Path: header field when the
    message is finally delivered.</t>

    <t hangText="Tagged address:">The bounce address modified
    according to <xref target="Syntax"/>.</t>

    <t hangText="Tagged domain:">The domain part of a tagged address.</t>

    <t hangText="Tagging scheme:">The label used to identify the
    algorithm defined in <xref target="Schemes"/> by which a tagged
    address is created and verified.</t>

    <t hangText="Target:">When talking about an attack, the target is
    the recipient of email sent by the attacker.</t>

    <t hangText="Untagged address:">The original bounce address as
    specified by the message sender.</t>

    <t hangText="Victim:">When talking about an attack, the victim is
    identified by the forged bounce addresses used by the
    attacker.</t>

    <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
    "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
    and "OPTIONAL" in this document are to be interpreted as
    described in <xref target="RFC2119"/>.</t>

    <t>Syntax specifications use the ABNF language specified in
    <xref target="RFC2234"/>. Terminals not defined in this document, such
    as ALPHA, DIGIT, SP, CR, LF, CRLF, are as defined in the "core" syntax
    in section 6 of <xref target="RFC2234"/> or in the message format
    syntax in <xref target="RFC2822"/>.</t>

    <t>Examples use a common scenario: The domain example.edu accepts
    incoming email for addresses @example.edu via the host(s) called
    mx.example.edu. The majority of the email is delivered to a message
    store called store.example.edu, which users can access via
    imap.example.edu. An <xref target="RFC2476"/> submission service is
    provided at smtp.example.edu and a general-purpose outgoing email relay
    at relay.example.edu. A department runs its own email service on
    dept.example.edu which receives incoming email via mx.example.edu and
    sends outgoing email via relay.example.edu.</t>

   </list></t>

  </section>

<!-- === -->
 <section anchor="Model" title="Model">

  <t>Something about tagging by MSAs and verification by MTAs and/or the
  stunt DNS server.</t>

 </section>

<!-- === -->
  <section anchor="Syntax" title="Generic tagged address syntax">

   <t>This section describes the syntax of tagged addresses. The
   syntax is generic in that it is independent of the tagging scheme
   (<xref target="Schemes"/>), which is identified by a field in the
   tagged address. All the important information in a tagged address
   is in the domain part; this allows a special DNS server to be used
   for verification and tracking of tagged addresses, as described in
   <xref target="DNSconf"/>. The untagged address can be recovered
   from a tagged address using only knowledge of the generic syntax;
   this simplifies interoperability with with software that misuses
   the bounce address, as described in <xref target="Interop"/>.</t>

   <t><figure><preamble>An untagged address has the following syntax
   (simplified from <xref target="RFC2821"/>):</preamble>
    <artwork><![CDATA[

     Reverse-path = Path
     Path = "<" [ A-d-l ":" ] Mailbox ">"

     Mailbox = Local-part "@" Domain

     Local-part = Dot-string / Quoted-string

     Domain = (sub-domain 1*("." sub-domain)) / address-literal
     sub-domain = Let-dig [Ldh-str]

    ]]></artwork>
   </figure></t>

   <t><figure><preamble>A tagged address has the following syntax:</preamble>
    <artwork><![CDATA[

     Reverse-path =/ Tagged-path
     Tagged-path = "<" Tagged-address ">"

     Tagged-address = Local-part "@" Tagged-domain

     Tagged-domain = Tag "." Tag-suffix

     Tag-Suffix = Tagging-scheme "." Encoded-local-part
                  "." Tag-marker "." Domain

     Tag-marker = "a--t"

     Tagging-scheme = "opaque" / "fixed" / sub-domain

     Tag = (sub-domain *("." sub-domain))

     Encoded-local-part = sub-domain

    ]]></artwork></figure>
   </t>

   <section title="Restrictions on addresses">

    <t>The untagged address MUST NOT have an address-literal as its domain
    part. An MSA that tags addresses can enforce this restriction by
    rejecting any messages which has a bounce address that violates this
    requirement.</t>

    <t>The untagged address SHOULD NOT be source-routed (as indicated by the
    optional A-d-l part of the syntax). An MSA that tags addresses can
    enforce this restriction by stripping off the source route.</t>

    <t>The Encoded-local-part MUST fit in a domain label, that is it MUST be
    63 characters or less. This is slighly less than the maximum
    interoperable size of 64 characters guaranteed by <xref target="RFC2821"/>,
    though if the Local-part uses characters that require encoding its length
    is restricted further.</t>

    <t>The Tagged-domain MUST not be more than 255 characters. This
    means the Domain has a much more restricted length than usual.
    <cref>Work out what the overhead in a Tagged-domain actually is.</cref></t>

   </section>

   <section title="Encoding the Local-part">

    <t>Local parts of email addresses have a much less restricted syntax than
    domain parts. In order to accommodate this, the Local-part of the
    untagged address much be encoded so that it can be included in the
    Tagged-domain.</t>

    <t>The Local-part is transformed into the Encoded-local-part one
    character at a time. The whole Local-part is used: if the Local-part is a
    Quoted-string then the quotes are included.</t>

    <t>If the character is a digit (ASCII 48..55) or a lower-case letter
    (ASCII 97..122) the encoded form is the same as the original
    character.</t>

    <t>If the character is a hyphen (ASCII 45) the encoded form is a double
    hyphen "--".</t>

    <t>If the character is an upper-case letter (ASCII 65..90) and local
    parts are not case sensitive at this domain, the encoded form is the
    corresponding lower-case character. If local parts are case sensitive at
    this domain then upper-case letters are encoded as described in the next
    paragraph.</t>

    <t>The encoded form of other characters is a hyphen followed by the
    value of the character followed by another hyphen. For example, if the
    input character is "." then its encoded form is "-2e-".</t>

   </section>

   <section anchor="Untagging" title="Recovering the untagged address">

    <t>When recovering the untagged address, the syntax MUST be fully parsed.
    It is not sufficient to simply spot ".a--t." and strip off everything
    between the start of the Tagged-domain and the Tag-marker.</t>

    <t>As well as thoroughly checking the syntax, the Encoded-local-part MUST
    be valid, for example no non-hex digits in the encoding of "other"
    characters. If the whole of the tagged address is available (not just the
    tagged domain) the decoded Encoded-local-part MUST match its
    Local-part.</t>

   </section>

  </section>

<!-- === -->
  <section anchor="Schemes" title="Tagging schemes">

   <t>This section discusses alternative approaches to defining tagging
   schemes, before defining standard "opaque", "fixed", and "notag" schemes. A
   tagging scheme defines how to create and verify the "Tag-data" part of the
   generic syntax specified in <xref target="Syntax"/>. All implementations
   that create and verify tagged addresses MUST include at least the
   standard schemes.</t>

   <section anchor="AltSchemes" title="Alternative tagging schemes">

    <t>There are two general approaches to defining tagging schemes:
    public and private.</t>

    <t>Public schemes expose as much information to the recipient as
    possible, in order to allow the recipient to co-operate in the
    process of validating the tagged address. The advantage of this
    co-operation is that it scales well in the event of an attack: the
    victim doesn't have to have enough verification capacity to handle
    the full load of the attack. However there are a few
    disadvantages. Recipients cannot be expected to implement tagged
    address verification, especially in the short term. Replay attacks are harder to detect and
    stop, because there is no overview of tagged address usage. Other
    mechanisms for foiling replay attacks are needed, such as
    including a digest of the message data, but these make the scheme
    more complicated without improving its ability to deal with
    backscatter.</t>

    <t>Private schemes do not make any effort to benefit from possible
    co-operation between sites that know about the scheme. The main
    advantage is simplicity: a site can implement the scheme without
    concern for compatibility with other sites. Because CBA puts the
    tag in the domain part, a site has a good ability to detect and
    thwart replay attacks, as discussed in <xref target="Model"/> and
    <xref target="DNSconf"/>. However this requires victims to have
    enough capacity to handle the verification load caused by an
    attack. <xref target="OpaqueScheme"/> defines a standard private
    tagging scheme.</t>

    <t><xref target="Interop"/> describes situations in which
    general-purpose tagging schemes cannot be used because of
    interoperability problems with the recipient. These problem
    recipients assume a given sender always uses the same bounce
    address, so <xref target="FixedScheme"/> defines a standard fixed
    tagging scheme.</t>

   </section>

   <section anchor="AddrCanon" title="Address canonicalization">

    <t>The schemes below require that email addresses are canonicalized
    before use, so that a consistent string can be used as input to various
    cruptographic algorithms.</t>

    <t>The domain part of the canonicalized address MUST be all
    lower-case.</t>

    <t>The local part of the canonicalized address MUST not be quoted if it
    conforms to the Dot-string syntax defined in <xref target="RFC2821"/>.</t>

    <t>If local parts are not case-sensitive then the canonicalized
    address MUST have an entirely lower-case local part.</t>

   </section>

   <section anchor="OpaqueScheme" title="The opaque tagging scheme">

    <t>Tag-data created according to the "opaque" scheme contain a
    timestamp and a nonce. The timestamp is used to expire old
    addresses to protect them against replay attacks. The nonce
    ensures that each tag is unique. This data is encrypted in such a
    way that successful decryption also validates the rest of the
    tagged address.</t>

    <t><figure><preamble>An opaque tag's plaintext has the following
    syntax:</preamble>
     <artwork><![CDATA[

      XXX

     ]]></artwork></figure>
    <cref>Define the opaque tag plaintext syntax.</cref>
    </t>

    <t>An encryption key is created based on a site-wide private master key
    and the untagged address. The HMAC <xref target="RFC2104"/> algorithm
    is used with H (the hash function) being SHA1 <xref target="RFC3174"/>,
    with K being the master key, and text being the untagged address
    canonicalized according to <xref target="AddrCanon"/>.</t>

    <t>The plaintext is then encrypted with AES <xref target="FIPS197"/>
    using the encryption key to produce the ciphertext. The Tagged-address
    is then created according to <xref target="Syntax"/>, with
    Tagging-scheme set to "opaque", and Tag-data set to the ciphertext encoded
    using base32 <xref target="RFC3548"/> and with trailing "=" characters
    omitted. <cref>How long is the resulting opaque tag? We should define
    the plaintext syntax so that there are no trailing "=" characters to
    omit.</cref></t>

    <t>A tagged address using the opaque scheme is verified as follows. The
    Tag-data part MUST be the correct length; if it is not all upper-case it is
    converted to upper case. The untagged address is obtained according to
    <xref target="Untagging"/>. The encryption key is created as described
    above. The tag is decoded from base32 then decrypted. The resulting
    plaintext MUST conform to the syntax defined above. If this process is
    completed successfully, then the address MAY be considered to be valid.
    If the timestamp is older than one month, or if a replay attack has
    been detected (see <xref target="DNSconf"/>), the address SHOULD be
    considered invalid.</t>

   </section>

   <section anchor="FixedScheme" title="The fixed tagging scheme">

    <t>Tag-data created according to the "fixed" scheme consist of two
    sub-domain parts: the left part is a lookup key derived from the
    untagged address and the recipient address; and the right part is an
    arbitrary string.</t>

    <t>In general, each problem recipient will have a number of
    addresses, for example the posting, subscription, and unsubscription
    addresses of a mailing list, and because of case-insensitivity. This
    implies that the recipient address MUST be mapped to a canonical
    form. This canonical form is not used for sending email to the
    recipient so does not have to be a valid email address; for example it
    could be the pattern that matches all the recipient's addresses.</t>

    <t>The untagged address canonicalized according to <xref target="AddrCanon"/>,
    a "," character, and the canonical form of the recipient address are
    concatenated to produce an address pair. The address pair is hashed
    using SHA1 <xref target="RFC3174"/>, and the most significant 40 bits
    of the hash are encoded with base32 <xref target="RFC3548"/> to produce
    the 8 character lookup key.</t>

    <t>The arbitrary string is created once for each address pair and
    stored in a database. It SHOULD consist of at least <cref>How many
    characters in a fixed tag?</cref> random characters chosen from the
    digits (ASCII 48..55) and the upper case letters (ASCII 65..90). The
    database entry MUST also contain enough additional information to
    validate the Tag-suffix part of the domain.</t>

    <t>When creating a fixed scheme tagged address, the lookup key is derived
    as described above and used to retrieve the arbitrary string from the
    database. The Tag-data is the concatenation of the arbitrary string, a "."
    character, and the lookup key. To verify the address, the lookup key and
    arbitrary string are extracted from the address and converted to upper
    case. The lookup key is used to retrieve a string from the database; the
    lookup MUST succeed, and MUST result in a string that matches the upper
    case version of the arbitrary string from the address. The additional
    information MUST validate the Tag-suffix.</t>

   </section>

   <section anchor="NotagScheme" title="The notag tagging scheme">

    <t>The "notag" scheme, as its name suggests, does not include
    any Tag-data. It is purely for use as part of the recipient
    address verification protocol described in <xref target="DNSCBV"/>,
    and MUST NOT be used to tag the bounce address of legitimate
    email.</t>

   </section>

  </section>

<!-- === -->
  <section anchor="MTAconf" title="MTA operation with CBA">

   <t>Blah.</t>

  </section>

<!-- === -->
  <section anchor="DNSconf" title="Configuring the DNS for CBA">

   <t>This section describes three ways of configuring the DNS for use
   with CBA. The first, simplest way provides roughly equivalent
   protection to <xref target="BATV"/>. The more complicated
   arrangements provide better protection against replay attacks and
   fake taged addresses.</t>

   <section anchor="SimpleDNS" title="Simple DNS configuration">

    <t>CBA can be set up using a standard DNS server to provide a
    basic level of protection against backscatter.</t>

   </section>

   <section anchor="MediumDNS" title="Intermediate DNS configuration">
   </section>

   <section anchor="ProperDNS" title="Recommended DNS configuration">
   </section>

  </section>

<!-- === -->
  <section anchor="DNSCBV" title="Verification of addresses by recipients">

   <t>The idea here is that when a recipient receives a message
   with an untagged bounce address, they should create a notag
   version and look up its MX in the DNS. NXDOMAIN implies the
   domain does not use tagged addresses; a successful MX result
   implies that the domain knows about tagged addresses but either
   that user doesn't use them, or the site is using a simple DNS
   setup (<xref target="SimpleDNS"/>); a NODATA result implies that
   the address is invalid - to see why, consider the effect of
   installing the TXT record as described in <xref target="MediumDNS"/>.
   <cref>Is this clever distinction between NXDOMAIN and NODATA too
   much abuse of the DNS? Should I go back to a design based on SRV
   records?</cref></t>

  </section>

<!-- === -->
  <section anchor="Interop" title="Interoperability Considerations">

   <t>Blah.</t>

   <section title="The use of null return paths">
   </section>

   <section title="Using the bounce address to identify the sender">
   </section>

   <section title="Internationalization">
   </section>

  </section>

<!-- === -->
  <section anchor="Security" title="Security Considerations">

   <t>To be filled in.</t>

   <section anchor="MoserAttack" title="Deliberate address expiry">

    <t>Moser's attack</t>

   </section>

  </section>

<!-- === -->
 <section anchor="IANA" title="IANA Considerations">

  <t>Something about allocation of tag schemes.</t>

 </section>

 </middle>

<!-- === -->
 <back>

  <references title="Normative References">
   &rfc2104;
   &rfc2119;
   &rfc2234;
   &rfc2476;
   &rfc2821;
   &rfc2822;
   &rfc3174;
   &rfc3548;
   <reference anchor="FIPS197">
    <front>
     <title>Advanced Encryption Standard (AES)</title>
     <author><organization>National Institute of Standards and Technology</organization></author>
     <date year="2001" month="November" day="26"/>
    </front>
    <seriesInfo name='FIPS Pub.' value='197' />
   </reference>
  </references>


  <references title="Informative References">
   &rfc3552;
   <reference anchor="BATV">
    <front>
     <title>Bounce Address Tag Validation (BATV)</title>
     <author fullname="John Levine" initials="J." surname="Levine">
      <organization>Taughannock Networks</organization>
     </author>
     <author fullname="Dave Crocker" initials="D." surname="Crocker">
      <organization>Brandenburg InternetWorking</organization>
     </author>
     <author fullname="Sam Silberman" initials="S." surname="Silberman">
      <organization>Openwave</organization>
     </author>
     <author fullname="Tony Finch" initials="T." surname="Finch">
      <organization>University of Cambridge</organization>
     </author>
     <date day="7" month="September" year="2004"/>
    </front>
    <seriesInfo name="Internet-Draft" value="draft-levine-mass-batv-00"/>
   </reference>
  </references>

<!-- === -->
  <section title="Acknowledgments">

   <t>Markus Kuhn (mgk25@cam.ac.uk) suggested the encryption technique
   described in <xref target="OpaqueScheme"/>.</t>

   <t>Ian Jackson (ijackson@chiark.greenend.org.uk) pointed out that
   it is possible to rate-limit verification requests.</t>

   <t>Roger Moser (Roger.Moser@rama.pamho.net) pointed out the attack
   described in <xref target="MoserAttack"/>.</t>

  </section>

 </back>

</rfc>
