FOR DISCUSSION PURPOSES ONLY University of Cambridge Computing Service Mail Planning Document No. 100 Thoughts on a future MTA Author: Philip Hazel 13 March 1995 For some time I have been thinking about the features I would like to see in an MTA. Coincidentally, there has recently been some discussion on the Smail developer's list on the same topic. I have therefore decided to get some of my thoughts written down as a contribution to that discussion. This is not intended to be a complete list of desirables for an MTA. It is simply more input to the discussion. My experience of mail is entirely within the Internet. Therefore this document is inevitably written from an Internet mail perspective. 1. Overall approach The author of Smail has said publicly that it was conceived in an earlier age and is now beginning to show it. While I agree with this, I think that many of the concepts are still valid and should be re-used in any new MTA. I like Smail's general approach to mail delivery. The directors, routers, and transports concepts cleanly separate the various processing elements involved. I would favour keeping something along the same lines, including the style of the configuration files, which is very easy to understand. The addition of regular expressions to some configuration parameters could be of great benefit. I would press for more generality wherever possible. For example, Nigel Metheringham added the ability to specify a transport for the smartuser director -- a new MTA should allow *any* director to have a transport specified, unless there is some really obvious reason why it doesn't ever make sense. Another example: I could use a transport specification on the aliasfile director to process all messages to a certain set of users (users whose accounts have been cancelled, in my case) by a single script. The code of Smail has been much hacked about as various people have added function to it. That it has been able to withstand such activities is evidence of a good original design, but as is inevitable with changing programs, it has become untidy and unnecessarily duplicated in a number of places, and would benefit from a re-write for that reason alone. It would be helpful to those working on a new MTA if a "developers' manual" could be written, documenting internal interfaces, macros, etc. This would take time to write and to keep up-to-date, but I think it would be worth it. There should also be an agreed coding style. The code of Smail shows its age in another way; it is not ANSI C. There are few systems still around that do not now have an ANSI C compiler. Any new MTA should be written initially in ANSI C, and then, if it proves necessary in order to port it to certain systems, macros or pre-processing of some sort should be use for what will surely be a minority of systems. I am aware that some people think a new MTA should be written in something like Perl rather than C. I am worried that this would not be efficient enough in time (on busy systems) and possibly space (on small systems). Maybe there is a middle way in which some parts of the MTA are in C and others in something else. Splitting the MTA up into several parts is something that has been discussed before. A monolithic single binary like Smail has the advantage of being easy to install and update -- no problems about keeping things in step (well, apart from the configuration files). On the other hand, it is big, and any new MTA is likely to be bigger. Handling a multi-part MTA as a single entity for installation etc. can easily be done by operating on a directory, and of course there should be no problem about using common functions in more than one binary. I'm thinking of things like address parsing routines. 2. Mail regimes Internet mail and SMTP are now much more domininant than when Smail was written; Smail has adapted to them, but still shows evidence of its past. Many systems (such as the one I use) support only Internet mail, and would prefer to use an MTA which restricts itself to the relevant RFCs and does not, for exmaple, get involve with bang paths in any way. Of course, there are still non-Internet systems out there -- I am not for one movment advocating the writing of a mailer that does not support them. I am arguing for suitable configuration facilities (and probably separation of code such as address parsing functions). 3. User facilities Once a user has sent a message, there is no way that the state of the message can be interrogated. I think it would be useful if the user could find out which messages were still on the local queue, and why they are delayed. Equally, it would be useful for a user to be able to query the mail history (for a short time, anyway) to find out when a particular message was delivered, and to what machine. There are those who argue this isn't much use when mail gateways are involved, but very many messages do not go through gateways, and for them such information is useful. Users sometimes ask for messages that sre stuck on the queue to be cancelled. It would save the postmaster work if they could do this themselves. Providing the above faciliites gets more difficult when sets of machines with a central mail handler are involved. The authentication issues will need looking into carefully. 4. Administration facilities I would like a new MTA to have a wider range of facilities to enable its administrators to manage it more effectively. The following sections describe some of the ideas that have come to mind. 5. Retry strategies Smail has only one retry strategy, though one can set the parameters for individual addresses. I would like to be able to choose different strategies for different addresses or sets of addresses, and also to specify changes of strategy according to different circumstances. The kinds of strategy I envisage are: . Smail's current "retry after no less than n time units have passed since the previous faialure" strategy. . A strategy in which the retry time for a given address increases after every failure. There is scope for different rules for computing the increase. One might want, for example, to change from the first of these to the second after a message has been on the queue for a certain length of time. The change, of course, is to the strategy for the address, not the message, so a new message coming in to the same address would immediately get a long delay on failure. It would be nice if the MTA kept some kind of list of messages queued for particular addresses. Then once one message had eventually been delivered, it could immediately attempt to delivery any others queued for the same address (possibly even in the same SMTP connection). Specifying different actions depending on what delivery failure occurred would also be useful. For example, we are seeing increasing number of messages mis-directed to machines that are never going to listen for mail (MS-DOS systems and the like). They give "connection refused" errors on all attempts. An administrator might like to configure the MTA to give up after a shorter time on the queue if all delivery attempts consistently give "connection refused" (for example, 6 hours instead of 6 days). If the MTA could remember which machines gave this error, it could be configured to give up after a single attempt on any subsequent messages. This could be conditional on the non-existence of an MX record, if so desired. Actually, a general (configurable) rule of the form "if a message has sat on the queue so long that it has been returned to sender, then bounce any subsequent messages to the same address pretty quickly until something succeeds" would also be of benefit, I think. I frequently see a message sit on the queue for 4 days and as soon as it has been returned to the sender, he or she immediately sends another one. (Users do usually give up after this has happened twice, thank goodness.) An immediate bounce in these circumstances could also be made conditional upon the non-existence of an MX record. It is not always recognized that local deliveries can also be deferred for legitimate reasons. If delivery is taking place into users' home directories, and these are NFS-mounted, a given directory may not always be available. If quota control is being applied to the mailbox partition, delivery will be deferred when a user is over quota. Configuration options are needed to control what happens in these circumstances. One might, for example, want to defer mail to an over-quota mailbox for a while if the mailbox has been recently read, but give an immediate bounce if it hasn't been read for 6 months. 6. Monitoring and controlling the MTA As some readers will know, I've been developing a monitor for Smail that displays information about what it is doing in an X window. A future MTA should have facilities for this kind of thing built in -- my program's way of getting information is to scan the logs and the queue itself, which is wasteful. There are many kinds of information that an administrator might want to keep an eye on. For example, my smail monitor currently provides: . A stripchart of the total number of messages in the queue. . Stripcharts of arrivals and deliveries, according to use-specified criteria (e.g. local deliveries, deliveries to a certain machine, receipts from a certain machine). . An abbreviated tail of the log, displaying one line per receipt and delivery. . A list of all messages in the queue, with their senders and undelivered addresses. Other informational facilities that sprint to mind are: . Stripcharts of counts of queued messages according to user-specified criteria (e.g. messages queued for a particular address). . Inspection of individual messages -- reading their headers and the history of their delivery attempts. . Information about what the MTA is doing with individual messages. At the moment I see a whole slew of Smail processes on this sytem and have no clue as to which is doing what. It would be nice to see that process nnn is working on message xxxxxx and currently trying to deliver it to machine yyyyy, and has had a TPP/IP connection open for z minutes, for example, and similarly for reception processes. Then there are actions that it would be nice for the administrator to be able to perform from the X window, such as: . Delete one or more messages. . Abandon the current (presumably stuck) delivery attempt for a given message, either with a soft (defer) or hard error. . Hold one or more messages [for a given time] or hold messages for a certain address. . Retry delivery of one or more messages, or all messages for a given address. . Re-route messages addressed to a given address [for a given time] -- in effect, override what the routers decide. . Temporarily prevent the MTA from attempting any deliveries, locally or remotely. . Temporarily preent the MTA from accepting any remote incoming messages, without having to kill the daemon and restart. (Internally, of course, it could be implemented by killing the daemon and restarting...) . Start a queue run, optionally omitting certain messages. The option could be by message id, or messages older or younger than a given time. We had an incident recently when the /var partition filled up, causing a lot of messages to end up in the /error directory. Afterwards, moving them to the /input directory and starting a queue run appeared to achieve nothing -- because there was a message previously on the queue that took tens of minutes to time out when delivery was attempted. Only after that did Smail get onto the mssages of interest. . Control the maximum number of simultaneous TCP/IP rectption connections, either as a fixed number, or as some value dependent on the machine's load. . Control the maximum number of simultaneous TCP/IP delivery connections, likewise. . Control the maximum number of simultaneous local delivery processes, likewise. For systems where the use of an X window is not possible, as many of these facilities should also be provided via an alternative interface. [sic] 7. Address rewriting There are those who feel that rewriting any header is a mortal sin; there are others who feel that the facility is the only way they can run their mail systems sensibly. The inclusion of the ability to rewrite does not force any administrator into using it. The facilities for rewriting should be as general and flexible as possible, so as to make it possible to tailor the MTA closely to the requirements of its administrator. The current "qualify" facilities of Smail can be subsumed into a more general rewriting facility. It should be possible to specify rewriting that applies only to one particular address source, or to any arbitrary combination of them. By "address source" I mean the following: . The sender's address in the envelope. . Recipient addresses in the envelope. . Each individual header line that contains addresses. Thus, one might request rewriting of the from: but not the sender: header, for example, and in principle it should be possible to specify *different* rewriting of the sender's address in the envelope to that of the from: or sender: headers. It must be possible to specify rewriting of either the domain part or the user name part of the address, or both, and regular expression syntax is needed for generality. The two most commonly expressed requirements seem to be: . Rewrite user@*.bar.com as user@bar.com . Rewrite abc123@bar.com as A.N.User@bar.com The rewriting configuration can be held in a single configuration file, provided that means are provided for consulting other files if necessary. For example, to rewrite abc123 as A.N.User a dbm or NIS file would probably be consulted. Optional logging of rewrites would be helpful in assuring administrators that they have got their configuration correct, and in chasing problems. Normal logging of the "orig-to" and "parent" type does this for envelope destination addresses but not for header fields. 8. MX handling Smail currently permits the setting of "mx-only" for given domains. There needs to be a mechanism for listing exceptions within those domains, though I do sincerely hope that in time "mx-only" becomes the only state on the whole Internet. It would eventually save a lot of sassle, though getting to it would cause pain. 9. ESMTP The MTA could remember which systems failed when it tried "EHLO", and not wast time trying them again (subject to a timeout). 10. Logs The level of detail of Smail's logs is needed, but care should be taken to minimize the size. One of our systems is currently producing 9MB of Smail log a day, using the Smail 3.1.28 format, which does contain quite a lot of white space. I prefer a format that doesn't generate wide lines, however, as that is easier for humans to deal with. 11. Checking the format of incoming mail This is another area of heated debate. One side argues for accepting anything that is plausible; the other side argues that refusing malformed messages saves postmaster work and leads to a better world in the long run. Once again, providing the ability to do such checking does not force any administrator to use it. The following checks can, in principle, be made (what I mean by "valid" is expanded on in the next section): . Check that the sender's address in the envelope is valid. . Check that the addresses in the from:, sender:, and reply-to: fields are valid. . Check that the headers conform to RFC 822, in particular, that a minimum set of headers exists. . Check that the RFC 1413 identity from certain machines is the expected one. If a check fails, it should be configurable as to whether the message is refused or just warned. Some people might refuse if any check fails; others might be prepared to accept messages as long as one of the sender addresses is valid. In all cases, the header can be logged. An invalid sender address in the envelope means that an error message cannot be returned if the message proves to be undeliverable. Accepting such messages may increase the load of the local postmaster; refusing them catches "joke" mail from god@heaven and the like quickly. An invalid address in the from: or reply-to: fields means that a user reply will fail. Ultimately this will probably also make work for the local postmaster. I have been running our systems configured to refuse mail with bad envelope senders for some time. I am somewhat surprised at the volume of bad messages that get refused. They fall into several categories: . Messages where the envelope sender and the from: field are both rubbish such as user@local user@unqualified user@ user@total-gobbledegook.containing.bad.chars,etc. Occasionally such a message contains a sensible address in the reply-to: field. . Messages where the envelope sender's domain looks plausible, but is not registered in the DNS. Often, the name seems to be a valid domain with a workstation name (or other subdomain) tacked on the front. Occasionally the problem lies in the DNS, when there's been a lame delegation so that one official nameserver doesn't contain the data that it should. Sometimes the from:, sender:, or reply-to: fields contain a slightly different, valid address. . Syntactically correct, but semantically nonsensical domains. These include the typical "joke" forgeries, but also things like somewhere.ac.uk.ac.uk that can result from misconfigured mailers. The question is: what kinds of action are administrators likely to want their MTA to take on encountering malformed messages of this kind? For example: . Warn on any malformation. . Refuse on any malformation. . Refuse only if envelope sender is bad (don't check the others). . Accept if one of envelope sender, from:, sender:, reply-to: is good, and use that address if an automatic delivery error message is required. There may be pressure to rewrite the bad ones. I suspect this is not a good idea, but should some kind of additional header be generated if the only good sender address is the envelope sender? Otherwise the recipient typically won't see it. . etc. Refusal of messages has to happen after the data has been accepted, so that the headers can be checked (if required) and logged. Sadly, a number of MTAs treat an SMTP 5xx error given at that point as a temporary failure, and continue to try to send the message for days (this disobeys RFC 821). A really clever MTA might be able to noticee when the same message comes from the same machine again, and try bouncing it earlier in the protocol, such as immediately after getting the bad envelope sender. 12. Invalid addresses The previous section discussed possibilities for actions when invalid sender addresses are received. One cannot, of course, check an address completely without actually attempting to send mail to it. For remote addresses, the best that can be done is to check that the domain makes sense. The user name cannot be checked. You just have have to take a chance on that. (It is depressing how many mail sites bounce mail to postmaster or mailer-daemon, incidentally.) There have been suggestions that the SMTP VRFY command be used, but that is frequently not implemented, and in any case it doesn't help with gateways. The way to check the domain is of course to pass it through the routers and see if any of them can handle it. However, you don't in fact want to use all the routers for this -- a smarthost router would defeat the object. A specific set of routers for checking incoming addresses is needed. (I do this on the current Smail by adding flags to the routers.) An attempt to check a remote address in the DNS may time out. In this case, outright rejection of the message is obviously not the right thing to do, and a soft error should be used (if configured for rejection). Alternatively, there could be a configuration option to take such addresses on trust. There are two cases when mail arriving from outside contains a local domain in the sender's address: . A local user has sent mail to a mailing list or to someone who has forwarded their mail, causing it to come back to the sending system, or is using a source-routed loop to test things. . A local user is using an MUA such as PC-Pine or Eudora on a workstation, and using SMTP to send outgoing mail to the IMAP or POP host. It's a pity one can't easily distinguish these two. The latter case is really a sort of "client-server" operation, for which ideally the message should be treated as "locally originated". Possibly one could decide this by noticing the lack of Received: lines. In both cases the user name can be checked for validity. Checking in the password file is not good enough, because mail gets sent out with alias names such as postmaster. The way to check the name is to run it through a selected set of directors. Again timeouts are possible (thanks, NIS) and should be catered for. 13. Checking recipients at reception time Smail always accepts an incoming message and worries about whether it can deliver it afterwards. Some mailers check local recipients immediately they see them when receiving via SMTP. An invalid local recipient causes an error response to the rcpt to: command. This has the benefit of saving the bandwidth of transferring the message (if all recipients are bad). On the other hand, it cuts out the possibility of processing messages to unknown users and returning to the sender some suggestions as to what might be wrong with the user's name. Implementing this has proved useful in our environment in cutting down the postmaster load. A simple configuration switch could be provided so that the MTA manager could choose to check local recipients early or not. Indeed, there's no logical reason why remote recipients should not also be checked as well. Two switches. 14. An MTA database The only data that Smail retains about individual messages while they are on its queue is the message log file, which lists successes, failures, and deferrals. The only other memory in Smail is in the retry directories (a facility added quite late in the life of Smail3) where failures to deliver to certain machines are remembered so that new deliveries are not attempted too soon. A more sophisticated MTA will need some kind of unified long-term data store, in effect a database of some wort. I can see two possible general approaches to implementing this: (1) Keep it in a file, and use locking to control simultaneous access by different MTA processes. (2) Have a continuously-running process to manage the database, and require all accesses to be via this process. The second approach seems to me to be more complicated, especially in the way authentication is handled, and requires special handling to ensure that the managing process is always running. As I have no experience of either of these ways of working, I have no feel for the efficiency issues. Obviously a process could keep a cache in main memory, which could be a gain. I am a bit worried that winding into action a large database-based system for each message delivery could use a lot of unnecessary resources for many messages. For example, a large number of messages on our systems are between users of the same system, and get delivered straight away without any problems. Setting up and taking down a "message record" (or whatever it gets called) seems unnecessary in these cases. One way of approaching this is to distinguish between the first delivery attempt and subsequent attempts. Creating per-message data in the database can be deferred until the first attempt has failed to deliver to all recipients. Of course, the database has to be consulted to determine, for example, that a remote address should not be tried just yet, but it need not be updated (unless it is also being used to keep a historical record for easy interrogation). Whichever method of implementing the database is chosen, some means of cleaning it up is required, as it is sure to accumulate ancient data in the way that Smail's retry directories currently do. A process that continuously crawls around looking for things to throw away could be used for this. The database must be capable of handling arbitrarily long records for each key. (N)dbm files are limited in this respect, so some kind of continuation scheme is required if dbm files are to be used. 15. Queue management At least one other MTA uses a "queue manager" to control deliveries. This is a continuously running process that can decide when to attempt deliveries of messages in a more flexible way than just "run the queue every 30 minutes". However, the danger is that when it goes wrong, mail delivery ceases. The real usefulness of such a manager is for messages that fail to get delivered on the first attempt, so I would propse that there always be a first attempt at immediate delivery (subject to retry data, of course), and only if that fails should the message come under the control of a queue manager. That approach should use fewer resources for messages that go through first time. 16. Security The authors of any new MTA should probably review all the recent mail-related security incidents and plan their implementation strategy accordingly. It is particularly important to be paranoid about any imported text strings. 17. Miscellaneous optimizations This section contains a list of optimizations that Smail doesn't do. . Send more than one message in a single TCP/IP connection. . Start delivery processes after each message is received in a TCP/IP connection, instead of waiting till they have all arrived. . Don't go through all the routing logic again for another recipient at the same address, e.g. for abc@foo.com and xyx@foo.com just do the routing once. . Make use of persistent routing information across different messages, as has been suggested before. This is a generalization of the previous item. It would be nice to know if these things really were optimizations; suitable instrumentation of the MTA for recording the resources used should be considered. 18. Miscellaneous things . For systems that deliver mail into mailboxes in a directory with the sticky bit set, and do not have quotas enabled on the mailboxes, some MTA support for limiting the size of mailbox, or warning the users and/or the administration about mailboxes over a certain size might be helpful. . Inexperienced users are somewhat intimidated by the error messages that MTAs generate when they cannot deliver messages. Any new MTA should make some effort to generate messages that are as user-friendly as possible. (One of our users, on sending his very first message (incorrectly addressed), thought that incoming mail from something labelled "daemon" was obviously someone playing a trick on him!) . A standard way of passing messages for non-existent users and other classes of user (e.g. cancelled users defined in an alias file or identified in some other way by a directory) to a script should be provided. This would allow the administrator to add local content to the error message. . It appears that the only reallly safe way to do local deliveries is to do each one in a subprocess running as the end user. I think we just have to bite this bullet. . Smail's strategy of copying the sendmail interface causes less hassle than trying to do things differently. With the MTA split up into different modules, you probably have to go for a very small /usr/etc/sendmail that simply fires up the appropriate binary. * * *