Building a high capacity e-mail system
Sample topologyOne approach to multiplexing incoming mail is to have two layers of servers, a front line of relays that accept connections from foreign mail servers and a back line of mail servers that house users' mail. The front line servers or relay hosts should be set up so that their IP addresses are "A" records for mail.bigisp.com. In addition, an A record should assign a unique host name to each relay host so that specific servers can be accessed for administrative purposes. The back-end servers that hold users' mail should only be accessed by the relay hosts and can be given any host name as long as it is unique. In this example, barney and alf are relay servers, while ingrid and fritz are the back-end servers where mail is stored.
Every message must be relayed as it must first be accepted from the sending host by one of the relay hosts and then relayed to one of the back-end servers. This effectively doubles network traffic. This may, however, be offset by putting an additional NIC into the relay hosts to handle traffic to the back-end servers.
It is quite possible to place the back-end servers on an internal network. The back-end servers could then be placed behind packet filtering protection and even placed on private address space networks, as defined in RFC 1918. Only the relay hosts need to be exposed to traffic from foreign hosts, so the servers that are most vulnerable to attack contain no user content, adding extra protection for end users.
Since the relay hosts do not hold any user data, if one fails or is taken down for maintenance, its load can be switched to a backup server or one of the other relay hosts, using a technique such as IP address takeover.
Multiplexing POP3 and IMAP4Multiplexing of mail retrieval, accessed using either Post-Office Protocol 3 (POP3) or Internet Messaging Access Protocol 4 (IMAP4), is done using Perdition, a mail retrieval proxy written with this purpose in mind. Perdition allows users to connect to a content-free POP3 or IMAP4 server, that will proxy a connection to their real POP3 or IMAP3 server respectively. This enables mail retrieval for a domain to be split across multiple real servers on a per-user basis. Perdition is freely available, from http://vergenet.net/linux/perdition/ and is distributed under the GNU General Public License.
Perdition should be run on each server that users access to read their E-mail via POP3 or IMAP4. Typically, this would be the same servers that foreign hosts connect to when sending mail via SMTP. When a connection is made to Perdition in POP3 mode, it reads the USER and PASS commands and then refers to its popmap to find where the user's connection should be forwarded to. A connection is then made to the foreign pop server and Perdition enters the USER and PASS commands to the foreign server using the username and password read from the user. If authentication is successful then perdition pipes data between the client and the foreign server. If authentication fails then the foreign server connection is closed and the client connection is reset to the state it was in on initial connection. That is new USER and PASS commands are expected. Similarly in IMAP4 mode, Perdition accepts the LOGIN command and passes the username and password onto the back-end IMAP4 server specified in the popmap for authentication.
Pop mapThe pop map is analogous to the aliases file and user_map used to multiplex incoming mail on a per-user basis. The pop map determines the server to which each user will be directed once they have connected to Perdition. The format is:
The program makegdbm, which is provided as part of Perdition, can be used to create a binary of the pop map. To rebuild the pop map run:
makegdbm popmap.db < popmap
Support for regular expression and MySQL-based maps is available and access to PostgreSQL is currently in testing. Details of how to administer these maps is included in the documentation for Perdition.
Multiplexing other protocolsMultiplexing of incoming mail is covered by multiplexing SMTP, as this is the only protocol commonly used to distribute e-mail on the Internet. If other protocols were to be used for mail delivery, these could be passed through an SMTP gateway in any case.
Multiplexing of mail retrieval is more complex. In my experience, POP3 and IMAP4 are overwhelmingly the most popular method of mail retrieval. Multiplexing these protocols is handled by Perdition. Users who wish to access their mail using shell access can be assigned to a single server and given shell access to the spool directory, possibly via a network file system accordingly. Another approach to offering shell access is to develop a method of transparently transferring mail for shell users into a local mailbox.
ConclusionThe methods of distributing mail between multiple servers presented here represents a scalable solution for applications when hosting mail for a domain on a single server becomes impractical. The cost of this method is that typically mail delivery and retrieval must travel though one extra hop, increasing latency and effectively doubling network utilization for mail. The former is a problem inherent in any multistep mechanism and the latter can be alleviated by running multiple physical interfaces on a machine and spreading traffic between the interfaces.
Some work remains to be done to this system of distributing e-mail. In particular, pulling together the pop map, aliases, user_map, and general rules into a single resource would simplify management greatly. It may also be necessary to migrate users between back-end servers from time to time.
The architecture developed relies on each user's mail being located only on a single server. This alleviates contention between servers for locks, and avoids having to query each server for knowledge of a mailbox, since all servers have access to mailbox location information. Intelligent construction of pop maps, aliases, or user_map entries could be designed to migrate users' mailboxes to one of the servers closest to the user. When used in conjunction with routing techniques to force traffic for a particular port onto a specific host, the user can be forced to access one of the closest servers. Hence, a mail system distributed among separate physical locations can easily be created, enhancing the quality of service to end users though faster, more reliable access.
Page 2 of 2