An overview of Cambridge's email hub "ppsw" (as of mid-2004) ============================================================ $Cambridge: hermes/doc/misc/ppsw.txt,v 1.9 2004/07/16 14:43:38 fanf2 Exp $ The Computing Service's central email hub provides a number of services: Receiving email from users (inside and sometimes outside the University) and servers (friendly ones inside the University and possibly hostile ones outside the University) and delivering it to its destination. Routing email to email servers within the University, such as cus.cam.ac.uk, newn.cam.ac.uk, zoo.cam.ac.uk, flymine.org. Handling Managed Mail Domains, such as ucs.cam.ac.uk, quns.cam.ac.uk, hist.cam.ac.uk. Redirecting @cam email to the right place. Distributing messages to @lists addresses. Delivering @hermes email to the appropriate message store machine. Directing Hermes POP and IMAP connections to the appropriate message store machine. Filtering viruses out of email and identifying infected hosts. Scanning email for spam features. As well as the various email addresses outlined above, the hub has a number of host names and IP addresses which are advertised for use when configuring software, so that setups are isolated from any changes to the set of machines that provide the service. These names and addresses are: mx.cam.ac.uk: Used when configuring the DNS for email domains that are handled by the hub. Email arriving at this address must pass quite strict anti-spam and anti-virus checks. ppsw.cam.ac.uk: Used as a relay for outgoing email from servers within the University. Its checks are more lenient than mx.cam.ac.uk. It may not be used at all from outside the University. This is the historic name for the email hub, after the software it used to run. 131.111.8.129: The IP address to use instead of ppsw.cam.ac.uk when configuring email routing in software that can't cope with host names. This kind of setup is strongly discouraged. 131.111.8.128/27: The address range occupied by the hub, which may be used in strict access-control configurations. Such setups ensure that email servers in the University receive email only from the hub. Computer Officers who have configured their servers this strictly should notify that they have done so, in case this address range changes. A less strict configuration which includes some other Computing Service machines is 131.111.8.0/24. smtp.hermes.cam.ac.uk: The outgoing email relay for Hermes users, for use both within the University and elsewhere. In the latter case encrypted and authenticated connections are required. Its checking is quite lax because it is usually talking to MUAs which have less reliable error handling behaviour than MTAs, imap.hermes.cam.ac.uk: The Hermes IMAP server redirecting proxy. pop.hermes.cam.ac.uk: The Hermes POP server redirecting proxy. The underlying hardware for the hub consists of several computers numbered between zero and nine. Each computer has three IP addresses, one for each of its personalities. (One of the machines has a fourth IP address for handling email sent via 131.111.8.129.) The personalities are ppsw, mx, and Hermes. ; the fixed ppsw IP address ppsw-v.csi.cam.ac.uk. A 131.111.8.129 ; the ppsw addresses ppsw-0.csi.cam.ac.uk. A 131.111.8.130 $GENERATE 1-9 ppsw-$.csi.cam.ac.uk. A 131.111.8.13$ $GENERATE 0-9 ppsw.cam.ac.uk. A 131.111.8.13$ ; the mx addresses ppsw-0m.csi.cam.ac.uk. A 131.111.8.140 $GENERATE 1-9 ppsw-$m.csi.cam.ac.uk. A 131.111.8.14$ $GENERATE 0-9 mx.cam.ac.uk. A 131.111.8.14$ ; the Hermes addresses ppsw-0h.csi.cam.ac.uk. A 131.111.8.150 $GENERATE 1-9 ppsw-$h.csi.cam.ac.uk. A 131.111.8.15$ $GENERATE 0-9 imap.hermes.cam.ac.uk. A 131.111.8.15$ $GENERATE 0-9 pop.hermes.cam.ac.uk. A 131.111.8.15$ $GENERATE 0-9 smtp.hermes.cam.ac.uk. A 131.111.8.15$ The machines' primary names and addresses are the same as their ppsw names and addresses. This means that email delivered from these machines appears to come from one of the ppsw.cam.ac.uk addresses, which is the historical behaviour. Using a load balancing router to improve ppsw --------------------------------------------- We currently depend on DNS round-robin to balance the load across the ppsw machines. Although this is by and large good enough there are some particular problems that cause repeated difficulty: * Taking machines out of service is difficult to do smoothly because many clients cache the IP address indefinitely, regardless of the DNS TTL. * Even without the cacheing problem, the usual DNS update frequency is once a day which is too slow for quick reshuffles of machines. * If a machine breaks it requires an emergency DNS update to take out of service, and service is degraded because of the cacheing behaviour mentioned above. * The .129 service IP address for ppsw is a hack to reduce the problems caused for certain important clients by lack of stable IP addresses. As well as solving the above problems, a load balancer would also help us improve aspects of the service which are less troublesome but still worth addressing: * Automatic monitoring of service availability and function, and automatic removal of broken machines from service. * More equal balancing of load. The lowest-numbered machine tends to get slightly more load than the others, though with sufficient capacity this isn't a problem in practice. * More efficient use of IP addresses. The addressing configuration for this setup would be somewhat simpler from the public point of view than the current arrangements. Each service name (ppsw.cam.ac.uk, mx.cam.ac.uk, *.hermes.cam.ac.uk) would have a single IP address. The .129 service IP address for ppsw would not have to be a special case. If the email system were to move to a two-site configuration we could have an IP address per service per site. Each machine would have three IP addresses on a private network behind the load balancer, one for each service. This is a private version of the public numbering we currently have. The load balancer distributes connections to the public service addresses across the private addresses hidden behind it. As before, each machine would have a public IP address within 131.111.8.128/27 for outgoing email, DNS, NTP, ssh, etc. traffic. Note that for 10 machines we only need 13 public IP addresses in this scheme, rather than 31 when using DNS round robin. The extra space could be used for providing new variants of the service, e.g. replacements for ppsw.cam.ac.uk with cleaner semantics; a version which supports client TLS authentication; a secondary MX to detect spammers trying to weasel past our defences; etc. We still need to investigate which load-balancing technology to use: * A software solution on a Linux box would be inexpensive but possibly less reliable from a hardware point of view -- though this can be compensated for with sufficiently advanced software. * Hardware solutions are now quite mature and flexible. They may also work as a replacement for our existing switches which would reduce the cost.