SMTP extensions T. Finch Internet-Draft University of Cambridge Obsoletes: 1845 (if approved) April 17, 2007 Intended status: Standards Track Expires: October 19, 2007 SMTP service extensions for transaction checkpointing draft-fanf-smtp-rfc1845bis-01 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on October 19, 2007. Copyright Notice Copyright (C) The IETF Trust (2007). Abstract This memo describes the SMTP service extension for checkpoint/resume, which allows a client to recover from a lost connection to the server without having to repeat all of the commands and message content sent prior to the interruption, and with less risk of duplicated messages. It also includes an updated specification of the predecessor SMTP service extension for checkpoint/restart. Document revision $Cambridge: hermes/doc/qsmtp/draft-fanf-smtp-rfc1845bis.xml,v 1.75 2007/02/06 00:56:44 fanf2 Exp $ Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 1.3. IANA Considerations . . . . . . . . . . . . . . . . . . . 5 2. SMTP service extension for checkpoint/resume . . . . . . . . . 6 2.1. Framework . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2. Overview . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3. Transaction IDs . . . . . . . . . . . . . . . . . . . . . 7 2.4. Octet offsets . . . . . . . . . . . . . . . . . . . . . . 8 2.5. Server-side resume state . . . . . . . . . . . . . . . . . 9 2.6. Retry versus resume . . . . . . . . . . . . . . . . . . . 10 2.7. Re-establishing a connection . . . . . . . . . . . . . . . 10 2.8. The RESUME command . . . . . . . . . . . . . . . . . . . . 11 2.9. The MAIL command TRANSID and TRANSOFF parameters . . . . . 12 2.10. The QUIT command . . . . . . . . . . . . . . . . . . . . . 13 3. SMTP service extension for checkpoint/restart . . . . . . . . 14 3.1. Framework . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2. Description . . . . . . . . . . . . . . . . . . . . . . . 14 3.3. Changes from RFC 1845 . . . . . . . . . . . . . . . . . . 16 3.4. Prefer checkpoint/resume to checkpoint/restart . . . . . . 17 4. Security considerations . . . . . . . . . . . . . . . . . . . 18 4.1. Server storage . . . . . . . . . . . . . . . . . . . . . . 18 4.2. Transaction IDs . . . . . . . . . . . . . . . . . . . . . 19 4.3. Unnecessary bounces . . . . . . . . . . . . . . . . . . . 19 5. References . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.1. Normative references . . . . . . . . . . . . . . . . . . . 21 5.2. Informative references . . . . . . . . . . . . . . . . . . 21 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . . 23 Appendix B. Changes since version -00 . . . . . . . . . . . . . . 24 Appendix C. Draft discussion venue . . . . . . . . . . . . . . . 25 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 26 Intellectual Property and Copyright Statements . . . . . . . . . . 27 1. Introduction Although SMTP is widely and robustly deployed, it does not handle the loss of its underlying connection particularly gracefully. There are two problems, and they are becoming more of a concern because of the way the Internet is changing. Firstly, if the connection is lost part-way through the transmission of a message, the client must retry the transmission starting from the beginning. When dealing with very large messages over less reliable connections it is possible for substantial resources to be consumed by repeated unsuccessful attempts to transmit the message in its entirety. Messages are getting larger on average. Wireless connections, which are much more likely to be lost in normal use, are becoming more common. Secondly, a connection that is lost after the client has transmitted the message but before it receives the server's final reply can result in a duplicated message. This problem is described in [RFC1047], which recommends that servers should minimize the time between receiving the last of the message data and sending their final reply. However it is often preferable to analyse the message data for spam and viruses at this point so that the server can avoid taking responsibility for an undesirable message, and this can be time consuming. Furthermore, the problem is worse for clients that try to make the most of high-latency connections. The minimum time for an SMTP transaction is normally one round trip, since the client has to wait for all outstanding server replies after sending the DATA command [RFC2920]. If the client's messages are smaller than the amount of data it can transmit in a round trip, i.e. the bandwidth*delay product, then the connection is sitting idle for some of the time. A client can work around this problem by using multiple concurrent SMTP connections, or by using the SMTP service extension for large messages [RFC3030]. The latter eliminates pipeline stalls so the client can stream multiple messages to the server while waiting for replies. However, with both work-arounds, one loss of network connectivity means multiple messages will need retransmission and may be duplicated. 1.1. Overview This memo provides a facility by which a client can uniquely identify a particular SMTP transaction. The server stores this identifying information along with all the information it receives and sends as the transaction proceeds. If the connection is lost during the transaction the SMTP client may establish a new connection and ask the server to resume the transaction. The server tells the client how much message data it saved from the interrupted transaction. The client can then perform an abbreviated transaction, repeating only those commands to which it did not receive replies, and transmitting only message data that the server does not yet have. These problems were originally tackled by the SMTP service extension for checkpoint/restart [RFC1845]. However it has a number of limitations (Section 3.3) so this memo describes the similar but improved SMTP service extension for checkpoint/resume in Section 2. We address backwards compatibility in Section 3 by re-defining checkpoint/restart as a modification of checkpoint/resume. Finally, Section 4 gives proper consideration to the security implications of these extensions. 1.2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. The "message envelope" comprises the SMTP MAIL and RCPT commands. The "message data" is transmitted using the SMTP DATA command, or alternatives such as the BDAT command [RFC3030]. An SMTP "transaction" is the sequence of commands necessary to transmit a message, i.e. a message envelope followed by message data. It starts with a MAIL command and ends with the server's "final reply" to the last of the message data, or with a reset. The server's final reply in an LMTP (local mail transport protocol) transaction can comprise more than one per-recipient sub-reply [RFC2033]. A "reset" is caused by the RSET, HELO, or EHLO commands. The metalinguistic notation used in this memo corresponds to the "Augmented Backus-Naur Form" defined in [RFC4234]. Rules not defined here are either defined in the ABNF core rules or in [RFC2821] or [RFC2822]. Metalanguage terms used in running text are surrounded by pointed brackets (e.g., ). 1.3. IANA Considerations This memo defines the SMTP service extension for checkpoint/resume, with the EHLO keyword value "RESUME", in Section 2. This memo replaces RFC 1845 as the definition of the SMTP service extension for checkpoint/restart, with the EHLO keyword value "CHECKPOINT", in Section 3. 2. SMTP service extension for checkpoint/resume This section uses the SMTP extension model specified in [RFC2821]. 2.1. Framework The SMTP service extension for checkpoint/resume is defined as follows: o The name of the service extension is "checkpoint/resume". o The EHLO keyword associated with the extension is "RESUME". o The RESUME EHLO keyword has no parameters. o This extension defines one additional command, RESUME, which MAY appear anywhere in a pipelined group [RFC2920]. Syntax: resume-command = "RESUME" SP transid-value CRLF o This extension defines the 355 reply to the RESUME command. Syntax: resume-reply = "355" SP octet-offset SP text CRLF o This extension defines two additional parameters to the MAIL command. The TRANSID parameter has the following syntax: esmtp-param =/ transid-param transid-param = "TRANSID=" transid-value transid-value = "<" transid-spec ">" transid-spec = dot-string "@" domain The TRANSOFF parameter has the following syntax: esmtp-param =/ transoff-param transoff-param = "TRANSOFF=" octet-offset octet-offset = 1*20DIGIT o The maximum length of the MAIL command is increased by 297 characters. The maximum length of a is 256 characters. o There are no additional parameters to the RCPT command defined by this extension and its maximum length is not increased. o This extension is suitable for use with message submission [RFC4409] and LMTP [RFC2033]. 2.2. Overview A server that supports the SMTP service extension for checkpoint/ resume SHALL include the RESUME keyword in its reply to the client's EHLO command. A client that wishes to use this extension MUST first check that this EHLO keyword is present. The client then proceeds as usual, except that it issues MAIL commands with TRANSID and TRANSOFF parameters (Section 2.9). The TRANSID parameter's value is described in Section 2.3 and the TRANSOFF parameter's value is "0" (zero). The server periodically checkpoints the transaction, retaining state so that it can later be resumed (Section 2.5). If all goes well, the client will close the connection normally and the server can discard the resume state (Section 2.10). If the connection is lost (Section 2.6) then the client can reconnect (Section 2.7) and issue a RESUME command (Section 2.8) for each transaction in the lost connection. It thereby discovers the at which it must resume transmitting each transaction's data (Section 2.4). The client then issues MAIL commands with TRANSID and non-zero TRANSOFF parameters to resume any transactions that are missing data on the server or for which the client lost any of the server's replies. 2.3. Transaction IDs A serves to uniquely identify a particular SMTP transaction started by a particular client. The and client identity together form the "transaction ID". The is structured to ensure global uniqueness without any additional registry. Its domain part SHOULD be a valid domain name that uniquely identifies the SMTP client. This is usually the same as the domain name given in the EHLO command, but not always. The EHLO domain name identifies the specific host the SMTP connection originated from, whereas the domain can refer to a group of hosts that collectively host a multi-homed SMTP client, or it can refer to a mobile client's home domain name, etc. The local part MUST be an identifier that distinguishes this SMTP transaction from any other originating from this SMTP client. Care must be used in constructing a to simultaneously ensure both uniqueness, unguessability, and the ability to reidentify interrupted transactions. It MUST be unguessable for security reasons; see Section 4. Despite the structured nature of a the server MUST treat the value as an opaque, case-sensitive string. Note that the contents of the [RFC2822] Message-ID: header field, or the MAIL FROM: ENVID parameter [RFC3461], typically are NOT appropriate for use as a , since such identifiers may be associated with multiple copies of the same message - e.g., as it is split during transmission to different recipients - and hence with multiple distinct SMTP transactions. For security reasons, servers MUST treat the same from different clients as different transaction IDs. Servers MUST use a secure identifier to distinguish clients, such as credentials from the SMTP AUTH command [RFC2554] or from TLS negotiation [RFC3207]. Note that these identifiers are independent of the IP address a client connects from: servers MUST allow authenticated mobile clients to reconnect and resume transactions from different IP addresses. If a client is not authenticated the server SHOULD use its IP address to identify it. 2.4. Octet offsets The represents an offset, counting from zero, to the particular octet in the actual message data the server expects to see next. (This is also a count of how many octets the server has received and stored successfully.) This offset SHALL NOT account for the message envelope. A value of 0 would indicate that the client needs to start sending the message from the beginning, a value of 1 would indicate that the client should skip one octet, and so on, up to a maximum equal to the message size. Additional requirements apply depending on the command(s) used by the client to transmit the message data: DATA [RFC2821] section 4.5.2: The SHALL NOT count any octets added by the dot-stuffing algorithm. The offset MUST also correspond to the start of a line, i.e. equal to zero or a point immediately after a . BDAT [RFC3030]: The SHALL count octets in the same way as the parameter to the BDAT command. It MAY point anywhere within a chunk. It SHALL NOT count any octets for the BDAT command itself. BURL [RFC4468]: The SHALL count the size of the content retrieved from the URL given in the BURL command. The offset MUST NOT correspond to a point within the BURL data; that is, the server stores the whole URL content or none of it. The SHALL NOT count any octets for the BURL command itself. These requirements are consistent with the semantics of the SIZE parameter to the MAIL command [RFC1870]. 2.5. Server-side resume state The server's "resume state" is the additional information it retains in order to implement checkpoint/resume. The server SHOULD keep track of the transaction IDs associated with each connection, that is the s used by the client in RESUME commands or TRANSID parameters. This so that it can prevent multiple connections from trying to modify the same transaction, as described in Section 2.7, and so that the server can discard the resume state of the connection's transactions when the connection is closed cleanly, as described in Section 2.10. What is included in a transaction's resume state varies depending on whether or not the server has received any message data, and whether or not it has committed the transaction. The server commits the transaction when it decides what its final reply to the last of the message data is. Servers retain no resume state before they receive any message data. This implies that clients must resume the transaction from the beginning if the connection is lost while the client is transmitting the envelope. Before the server commits a transaction, its resume state comprises the message envelope (including all client commands and server replies) and any message data that the server has received so far. It is OPTIONAL for the server to retain this state. That is, a server MAY be configured to require some or all clients to resume interrupted transactions from the beginning. When the server has received the last of the message data it SHOULD proceed to commit the transaction regardless of whether the client connection is lost while it does so. Since committing can be a multi-stage process (especially with LMTP [RFC2033]) this ensures that the client sees a consistent result even after a connection loss and resume. If the server commits to a 2yz final reply then it can continue with onward delivery of the message independent of client activity. After the transaction is committed, its resume state comprises the message envelope (as before), the size of the message data, and the server's final reply. If the client loses a connection and takes a while to reconnect and resume the transactions, it can be necessary for the server to retain this resume state for longer than it takes to transmit the message on to its next destination. If the client terminates a transaction with a reset, the server SHALL discard any resume state for the transaction ID and remove the transaction ID from the list of those associated with the current connection. If the client issues a reset between transactions, the server SHALL retain any resume state for the preceding transaction(s). 2.6. Retry versus resume A client SHOULD NOT attempt to resume a transaction unless the SMTP connection was lost after the start of the transaction. A connection is lost if it is terminated without the client sending a QUIT command to the server, or receiving a 421 reply to any command. This includes SMTP-level timeouts as well as loss or premature closure of the underlying connection. If the connection is lost before the first MAIL command, or message transmission fails for another reason, the client MUST use the retry algorithm specified by [RFC2821]. These non-resume situations include, but are not limited to, the following: o failure to establish a connection; o failure to establish a security layer; o failure to authenticate; o a temporary failure indicated by a 4yz reply from the server; o a connection forcibly closed by the server with a 421 reply. 2.7. Re-establishing a connection When a connection has been lost as described in the previous section, the client MAY reconnect immediately. It SHALL first re-check the DNS if necessary, as determined by the DNS records' time-to-live. If the IP address (target of A or AAAA record) used to connect to the original server is still on this list it SHOULD be tried first, since this server is most likely to be capable of resuming the transaction. If that IP address is no longer on the list or if the connection fails, then the client SHOULD next try any other IP addresses associated with the host name (target of MX record) used to connect to the original server. If these also fail then the client SHOULD fall back to the ranking algorithm described in [RFC2821] section 5. A client SHOULD NOT wait for a server with an interrupted transaction's resume state to come back online. Multi-homed and clustered SMTP servers do exist, which means that it is entirely possible for a transaction to be resumed on a different server host. Note that connection loss can appear different from the client and the server ends, so a client might detect the loss and reconnect before the server detects the loss. This can lead to legitimate circumstances where there are multiple connections to the server that appear to be trying to use the same transaction ID. Therefore, if a client issues a command containing a that the client has used in another connection that is currently active from the server's point of view, then the server SHOULD prevent the older connection from performing further actions that affect the same transaction. For example, the server MAY drop all but the connection with the most recent activity. Even if the server is still committing a transaction when the client resumes it, the server MUST issue the final reply that it commits to. 2.8. The RESUME command A client uses the RESUME command to discover how much message data the server has stored for a given , which is specific to that client. If the server has no resume state associated with the transaction ID, it SHALL reply with a 355 code and an of "0" (zero). If the server has resume state associated with the transaction ID the server SHALL reply with a 355 code and an equal to the amount of message data received by the server so far. The field in a 355 reply is REQUIRED to be the first thing on the first line of the reply. It MUST be separated from any following by linear whitespace. The can vary and MUST be ignored by the client. For example, 355 64735 is the transaction offset The client MUST NOT issue a RESUME command during a MAIL transaction. The server SHOULD reject a RESUME command issued during a transaction with a "503 Bad sequence of commands" reply. After reconnecting, clients SHOULD issue RESUME commands for all the transactions in a lost connection, not just for the transactions that were interrupted. This is so that when the client QUITs the server can discard all their resume state. See Section 2.10. The possible failure replies to the RESUME command are 500, 501, and 421, as described in [RFC2821] section 4.3.2. 2.9. The MAIL command TRANSID and TRANSOFF parameters A client SHALL start or resume a resumable transaction by issuing a MAIL command with one TRANSID parameter and one TRANSOFF parameter. It MUST NOT include more than one of either parameter, or omit either parameter. (However see Section 3 for the meaning of a TRANSID parameter without a TRANSOFF parameter.) If this is a new transaction, the TRANSOFF MUST be "0" (zero). The server SHALL discard any resume state that it may have retained for the given , which is specific to that client. The transaction then proceeds as normal, with the server retaining additional resume state as described in Section 2.5. If the client is resuming a transaction, then it MUST have previously issued a RESUME command in this connection with the same as in the TRANSID parameter. The value of the TRANSOFF parameter MUST be the same as the given in the server's reply to the RESUME command. Apart from the non-zero the MAIL command MUST be the same as the one issued by the client to start the transaction originally. If the client issues a MAIL command with a non-zero TRANSOFF that does not meet these requirements then the server SHALL reply with "503 Bad sequence of commands". Otherwise the server SHALL give the same reply to the MAIL command that it retained in the transaction's resume state. After the client has issued a MAIL command with a non-zero TRANSOFF, it MAY re-issue the transaction's RCPT commands. The client MAY omit some or all of the RCPT commands but it MUST NOT re-order them. The server SHALL give the same reply to each RCPT command that it retained in the transaction's resume state. If the client issues a RCPT command that was not retained in the resume state then the server SHALL reject it with a "553 Requested action not taken: mailbox name not allowed" reply. After the client has issued the envelope commands, it SHALL issue another data transfer command (e.g. DATA or BDAT) and send the remaining message data starting from the . The requirement in [RFC3030] against mixing DATA and BDAT in the same transaction still applies if the transaction is interrupted and later resumed. If the is equal to the message size, then the client SHALL issue a data transfer command with no data (e.g. DATA. or BDAT 0 LAST) and the server SHALL give the final reply it retained in the transaction's resume state. 2.10. The QUIT command The client MUST issue a QUIT command before closing the connection. It MUST NOT pipeline the QUIT command; that is, it SHALL wait to receive the replies to all outstanding commands before issuing the QUIT command. Note that this is stricter than [RFC2920] which allows the QUIT command to appear as the last command in a pipelined group. This requirement means that when the server receives a QUIT command, it can be sure that the client has received all the replies to all the transactions in the connection, so the server MAY then discard any resume state associated with these transactions. The server MUST NOT rely on this for resource reclamation and MUST still time out old resume state. This protects against malicious clients and some legitimate failure modes. For example, if the connection is lost after the client sends QUIT but before the server receives it, the client will not reconnect and the server will not immediately free its resume state. 3. SMTP service extension for checkpoint/restart This section re-defines checkpoint/restart in terms of checkpoint/ resume, and lists the differences between this specification and [RFC1845]. We also explain why checkpoint/resume is better than checkpoint/restart. 3.1. Framework The SMTP service extension for checkpoint/restart is defined as follows: o The name of the service extension is "checkpoint/restart". o The EHLO keyword associated with the extension is "CHECKPOINT". o The CHECKPOINT EHLO keyword has no parameters. o This extension defines one additional parameter to the MAIL command. The TRANSID parameter has the syntax defined in Section 2. o The maximum length of the MAIL command is increased by 88 characters. The maximum length of a is 77 characters. If the server also supports the checkpoint/resume extension, then its larger limits apply instead of (not in addition to) these limits. o This extension defines the 355 reply to the MAIL command which has the syntax defined in Section 2. o There are no additional parameters to the RCPT command defined by this extension and its maximum length is not increased. o There are no additional commands defined by this extension. o This extension is suitable for use with message submission [RFC4409]. 3.2. Description A server that supports the SMTP service extension for checkpoint/ restart SHALL include the CHECKPOINT keyword in its reply to the client's EHLO command. A client that wishes to use this extension MUST first check that this EHLO keyword is present. It SHALL then issue a MAIL command with one TRANSID parameter and without a TRANSOFF parameter. Such a MAIL command MUST appear as the last command in a pipelined group; note that this is stricter than the usual pipelining requirements for the MAIL command specified in [RFC2920]. Issuing a MAIL FROM:<...> TRANSID= command to a server that supports checkpoint/restart is equivalent to issuing the following hypothetical pair of commands to a server that supports checkpoint/resume: RESUME MAIL FROM:<...> TRANSID= TRANSOFF= The value of the hypothetical TRANSOFF parameter is the same as the given in the server's hypothetical 355 reply to the RESUME command. One of the two hypothetical replies to this pair of checkpoint/resume commands is given in response to the equivalent real checkpoint/ restart MAIL...TRANSID command. The reply is chosen according to the following rules: o If the hypothetical reply to the RESUME command would have indicated failure (i.e. not 355) then that is used as the real reply; o Otherwise, if the hypothetical reply to the MAIL FROM command would have indicated failure (i.e. not 250) then that is used as the real reply; o Otherwise, if the hypothetical given by the server's 355 reply and used in the MAIL FROM command would have been zero, then the hypothetical 250 reply to the MAIL FROM command is used as the real reply; (This is the case for new transactions.) o Otherwise, the hypothetical 355 reply to the RESUME command is used as the real reply. (This is the case for resumed transactions.) When a client wishes to start a new resumable transaction, it SHALL issue a MAIL...TRANSID command. If it does not get the expected 250 reply (which would indicate an accidental or malicious transaction ID collision) it SHALL issue a RSET command to reset the transaction, and re-issue the MAIL...TRANSID command. When a client wishes to resume a transaction after a lost connection, it SHALL issue the same MAIL...TRANSID command again. If it receives a 250 reply, it MUST repeat the transaction in full. If it receives a 355 reply, it MAY re-issue the rest of the transaction's envelope commands, then it SHALL issue a data transfer command and resume transmission of the message data at the exact indicated in the 355 reply. In all other respects this extension works in the same way as checkpoint/resume (Section 2). 3.3. Changes from RFC 1845 This section lists differences between checkpoint/restart as specified in this memo and as specified in [RFC1845]. We also give the reasons for each change. o Interworking with pipelining [RFC2920] is improved in a number of ways: * We explicitly state that a checkpoint/restart MAIL command has stricter pipelining requirements than specified in [RFC2920]. * A client that pipelines its envelope commands and message data (using BDAT [RFC3030] or BURL [RFC4468]) can lose the server's envelope replies when a connection is lost. Clients may now re-issue envelope commands when resuming a transaction. * A client can pipeline the end of one transaction with the start of the next, so a new transaction does not indicate that the previous transaction is complete from the client's point of view. Servers are no longer permitted to discard a transaction's resume state at this point. o We now specify the semantics of the when the message data is transmitted using the BDAT [RFC3030] or BURL [RFC4468] commands. o We now specify how to use this extension with LMTP [RFC2033]. o The server's data retention requirements have been loosened. This allows a server to advertise support for checkpointing to less trustworthy clients, with less security exposure. See Section 4. o In [RFC1845] the is coupled to the client's EHLO host name. This memo specifies the use of higher-level authenticated client identities where possible, so that mobile clients can re-connect from a different IP address (which implies a different EHLO host name) and resume interrupted transactions. o This specification allows a client to re-connect to the server sooner when resuming a transaction than is usual for normal SMTP retries, as specified in [RFC2821] section 4.5.4. This is friendlier to message submission clients that involve a human in the retry strategy. o We have identified some security concerns that are discussed in Section 4. o A couple of trivial matters: * [RFC2119] keywords for normative requirements. * Corrected octet counts in the MAIL command length limit increase. 3.4. Prefer checkpoint/resume to checkpoint/restart Even after updating and clarifying [RFC1845], there remains a significant problem with checkpoint/restart, which is why this memo defines and prefers the not-quite-backwards-compatible variant checkpoint/resume. The problem is that checkpoint/restart adds a pipeline stall to each transaction. This increases the time taken to send a message, which is particularly undesirable for message submission clients on high- latency connections. It also prevents clients from using the BDAT command [RFC3030] to stream messages to the server without pipeline stalls. This is particularly unfortunate because losing a connection during pipelined streaming can affect multiple messages, so it is desirable to have some way of recovering the state of transactions after a lost connection. The pipeline stall in checkpoint/restart has two functions, which checkpoint/resume handles in ways that avoid the stall. o In checkpoint/restart, the server tells the client whether the transaction is new or is being resumed. In checkpoint/resume, this information goes from the client to the server, which removes the need for an extra round trip in normal transactions. o When recovering from a lost connection, the client must find out the at which it will resume transmitting message data. This implies a client-server round trip before resuming the transaction. In checkpoint/resume this round trip is detached from the transaction using the RESUME command. The RESUME command can be pipelined, so if multiple transactions are to be resumed only one extra round trip per connection is needed, not one per message. 4. Security considerations The difficulties fall into three areas: additional storage requirements on the server; vulnerabilities associated with spoofed transaction IDs; and undesirable bounce messages. All are significantly mitigated if checkpoint/resume is only permitted for trustworthy clients (which generally implies secure authentication). This section also applies to [RFC1845], which does not discuss security. 4.1. Server storage A significant difference between partial transactions and complete transactions is that the server can recover the storage used by complete transactions by delivering their messages. Thus if a server gives partial transactions a long lifetime, it can be easier for an attacker to exhaust the server's disk space than with un-extended SMTP. The attacker does not have to outrun the server's ability to process messages, nor search for a destination address to which the server cannot deliver immediately (such as an over-quota user). The extensions in this memo permit a server to choose whether or not to store partial messages according to operational needs. For example, partial transaction storage might be permitted for authorized message submission clients, but MX clients might be required to recover from failed transactions by retrying the transaction from the start. When partial messages are stored, their lifetime should be scaled according to the typical time it takes for a client to recover from a lost connection, which will depend on the operational environment but will often be on the order of several minutes. Once a transaction is complete, the server still keeps some resume state so that clients can recover from connection loss during the [RFC1047] synchronization gap. Abusive clients can waste this space by failing to close connections cleanly with the QUIT command. The server has to restrict the lifetime of this resume state in a similar way to partial transactions. This resume state is relatively small, and is important for correctness (avoidance of duplicate messages) so its lifetime can be longer than that of partial messages. This memo does not provide a mechanism for clients to discover the lifetime of partial transactions and resume state on the server. 4.2. Transaction IDs If an attacker can guess a client's transaction IDs, it can perform a variety of attacks based on confusing a client about the state of a transaction or inserting unwanted data into a message. These attacks are made much easier by the extensions defined in this memo than in un-extended SMTP. We reduce the opportunity for attack by making transaction IDs unguessable and by tying them to the client's authenticated identity. Resumable transactions can be used to turn a truncation attack into a message modification attack. If an attacker can cause the client's connection to break in the middle of a message, the attacker can resume the transaction and append any data to the end of the message. TCP truncation attacks are much easier than TCP modification attacks. Un-guessable transaction IDs prevent attackers that cannot sniff the client-server connection (e.g. because they are off the client-server path or because the connection has a security layer) from doing this. Unguessability can be achieved by including enough random data. [RFC4086] provides some guidelines on how to do this securely. An attacker that guesses a transaction ID before the client uses it could potentially append data to the start of the message. In checkpoint/resume this is prevented by the client asserting that a transaction is new. In checkpoint/restart this is prevented by requiring clients to detect this attack and reset the transaction. As well as these client-side protections, the server has to tie transaction IDs to the client's authenticated identity. Even if an attacker can guess a transaction ID, it also has to break the client's authentication credentials in order to succeed. If it doesn't manage to do both then the server will consider the client's and attacker's transactions to have different IDs. These extensions can be used with unauthenticated SMTP, in which case the server uses the client's IP address to identify it. However this is usually not secure, especially with wireless or dial-up connections where it is relatively easy for an attacker to steal the client's IP address. 4.3. Unnecessary bounces It is generally preferable for SMTP servers to reject messages during the SMTP transaction instead of accepting them and later generating a bounce message. This allows message submission clients to present the error to the user immediately in a friendly manner. It also reduces the problem of backscatter from spam with bogus return paths, assuming that spam sending software has a partial SMTP implementation that doesn't emit bounces when its spam is rejected. The RESUME specification requires a server to honor its original replies to the RCPT commands. If the status of a recipient address changes from good to bad between the start of a transaction and its completion, it would seem that the server is forced to accept the message then generate a bounce. This should be a rare situation if we assume clients will prefer to resume transactions promptly. However it is possible for servers to reduce the problem by giving a negative reply to the end of the message data: in general a 450 reply if not all the recipients' statuses have changed, so that the client will retry the delivery later. 5. References 5.1. Normative references [RFC2033] Myers, J., "Local Mail Transfer Protocol", RFC 2033, October 1996. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2821] Klensin, J., "Simple Mail Transfer Protocol", RFC 2821, April 2001. [RFC2822] Resnick, P., "Internet Message Format", RFC 2822, April 2001. [RFC2920] Freed, N., "SMTP Service Extension for Command Pipelining", STD 60, RFC 2920, September 2000. [RFC3030] Vaudreuil, G., "SMTP Service Extensions for Transmission of Large and Binary MIME Messages", RFC 3030, December 2000. [RFC4234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", RFC 4234, October 2005. [RFC4468] Newman, C., "Message Submission BURL Extension", RFC 4468, May 2006. 5.2. Informative references [RFC1047] Partridge, C., "Duplicate messages and SMTP", RFC 1047, February 1988. [RFC1845] Crocker, D. and N. Freed, "SMTP Service Extension for Checkpoint/Restart", RFC 1845, September 1995. [RFC1870] Klensin, J., Freed, N., and K. Moore, "SMTP Service Extension for Message Size Declaration", STD 10, RFC 1870, November 1995. [RFC2554] Myers, J., "SMTP Service Extension for Authentication", RFC 2554, March 1999. [RFC3207] Hoffman, P., "SMTP Service Extension for Secure SMTP over Transport Layer Security", RFC 3207, February 2002. [RFC3461] Moore, K., "Simple Mail Transfer Protocol (SMTP) Service Extension for Delivery Status Notifications (DSNs)", RFC 3461, January 2003. [RFC4086] Eastlake, D., Schiller, J., and S. Crocker, "Randomness Requirements for Security", BCP 106, RFC 4086, June 2005. [RFC4409] Gellens, R. and J. Klensin, "Message Submission for Mail", RFC 4409, April 2006. Appendix A. Acknowledgments Thanks to Dave Crocker and Ned Freed for [RFC1845], from which this document borrows freely. Thanks are also due to Dave Cridland, Randall Gellens, and Alexey Melnikov for encouragement, review, and comments. Appendix B. Changes since version -00 o LMTP support o Clarified maximum value of o doesn't count BURL commands o Do not completely forbid clients from attempting to resume when there hasn't been a prior connection loss o Clarify normal connection closure with QUIT from the client's point of view o Resuming MAIL commands must be substantially the same as the transaction's original MAIL command o Clarify ordering of RCPT commands in a resumed transaction o Forbid mixing of DATA and BDAT o Clarify failure replies to RESUME. Appendix C. Draft discussion venue Feedback about this draft should be emailed to the author, or to the IETF SMTP discussion mailing list, , or to the Lemonade working group mailing list, . Author's Address Tony Finch University of Cambridge Computing Service New Museums Site Pembroke Street Cambridge CB2 3QH ENGLAND Phone: +44 797 040 1426 Email: dot@dotat.at URI: http://dotat.at/ Full Copyright Statement Copyright (C) The IETF Trust (2007). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Acknowledgment Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA).