Present: |
P. Hazel,
D.P. Carter,
R.J. Dowling,
F.A.N. Finch,
C.J. Jardine,
K.M. Jeary, B.K. Omotani, R. J. Smith, C.E. Thompson, J.M. Wilkins |
Apologies: | P. Stewart, R.A.W. Mee |
Date of next meeting: 1st February 2005 at 11:15 in C304
There have been hardware problems on two separate Cyrus mailstore systems since the last MDCM. On 16th November Cyrus-22 hung twice (once at 3am, once at 12:35pm) affecting around 2000 out of 37000 live Hermes accounts. Following the second incident the accounts were moved to the replica system Cyrus-21. The motherboard on Cyrus-22 has been replaced with a different model by the supplier. On 28th November three separate disks in a single RAID set failed on Cyrus-15 at 19:08 affecting 4200 users. The likely cause was noise on the SCSI bus. Service was restored by 20:18.
Hermes-1 (the Hermes Webmail and SSH service) was running short of memory towards the end of term. Webmail timeouts were reduced to 10 minutes on most screens and two hours on the compose screen. Experiments with timeouts on idle HTTP connections allowed us to reduce the number of active HTTP connections. While the system should now run comfortably next term, additional memory has been ordered so that we can increase the timeouts to something more sensible.
There have been no further system lockups on canvas or the PPSW systems since FANF disabled marks in the syslog daemon (which was causing a race condition). There have however been a number of kernel problems on different Cyrus systems, where a deadlock within the filesystem code caused all further access to a given file on the system to hang. On each occasion the machine in question had to be rebooted in order to clear the deadlock. A temporary workaround is to carefully reconstruct the relevant parts of the Cyrus database to work around the problem file. Apparently this is a long known problem which affects all of the Linux journalling filesystems. We have a trivial patch which should fix the problem for the Cyrus systems, although it is not appropriate for general purpose Linux workstations and servers.
There are now 85 lists on the Mailman @lists systems. The code has been upgraded to a local branch of the Mailman 2.1.6 code which includes a Web interface for external list members who do not have Raven accounts. The code seems to be stable and we should be ready for a live service as soon as documentation is ready. DPC and SP to liase about documentation early in the new year.
The documentation that FANF wrote to describe the use of the two SMTP smarthosts smtp.hermes.cam.ac.uk and ppsw.cam.ac.uk has now been released onto http://www.cam.ac.uk/cs/email.
We received a DPA request for information about logs on Hermes during the Michaelmas term. This is only the second DPA request that we have received that required searching of logs on Hermes since the DPA came into force.
There are currently approximately 9,500 people making insecure connections to Hermes, and 1,800 using ~/mail and variants. FANF has generated some breakdowns of these lists by primary affiliation and proposes to make them available by a Web site with summaries going to the Techlinks once a week. Work to reduce the two lists will start early in the new year. We propose to start with the institution with the largest number of insecure users (Engineering). CERT would like to be informed when the ratchet screw is turned and we will also need to liase with the Help Desk.
Cover for the single member of operations staff who was fielding queries to postmaster@cam.ac.uk has been arranged (at the second attempt).
Milestones 3 and 4 have been released. 450 people have updated their own directory entries using the Web interface since the start of the project. A debate has taken place about primary affiliation for graduate students and whether it should be the college or department in question.
FANF has received quite a lot of feedback from Techlinks, but nothing so far from the Development Office, who are the people most likely to be affected by restrictions on bulk email. He is still gathering information about likely problem cases: the rate limiting is not yet enforced.
The SRCF were running a Web application generating mail messages which turned out to be an insecure proxy. AOL blocked all mail from PPSW on 16th November after this interface was exploited by spammers. The incident demonstrated an unfortunate weakness in the current rate limiting proposal: the spammer generated a small number of messages each copied to large numbers of recipients. This would not have been picked up by the current rate limiting code which limits based on number of messages rather than number of recipients.
The MUA settings page on http://www.cam.ac.uk/cs/email have been updated to try and further discourage the use of POP. We receive about one request to recover email a day, almost all of which are from people using POP clients.
Earth Sciences run a pair of mail systems at different sites behind the single mail domain esc.cam.ac.uk. One of these systems accepts messages from the public internet but then later generates bounces if the messages are spam to invalid addresses on the other system, which causes collateral backscatter. Janet CERT have complained about this configuration.
DPC 2005-12-14