memcached and MIMEDefang – a cool combination

I like milter-ahead a lot. But in our particular deployment it is not a best fit for it assumes that all the useful information for deciding whether to accept or reject email resides not on the server that it runs on, but in the servers that it queries. This is not milter-ahead’s fault. Milters have no way of expanding aliases while checking the recipient address so the programmer has to use tricks like parsing the output of sendmail -bv user@address thus running a second sendmail process for the same delivery. The alternative would be to hack milter-ahead to check with the alias database the existence of recipient addresses, but doing so the way sendmail reads the alias database is overly complex. One could also write an external daemon to monitor the alias database and inject entries in the (Berkeley DB) database maintained by milter-ahead, but that database is locked exclusively. And yes, exceptions could be entered in the access database, but that would mean maintaining two files for a single (and not so frequent) change in the alias files.

As I’ve blogged before, one of the reasons that I like MIMEDefang is that it gives the Postmaster a full programming language to filter stuff. By simply using md_check_against_smtp_server() a poor man’s non-caching version of milter-ahead is possible. Adding support to read the alias database (be it the text file or the hash table) is also trivial.

But what about the case of busy mail systems? You do not want to hammer your mail servers all the time with queries for which the answer is going to be constant for long periods of time. You need a caching mechanism. At first I thought of implementing such a mechanism the way milter-ahead does: By using a Berkeley DB database and some expiration mechanism, either from within MIMEDefang (retrieve the key and if it should have been expired by now delete it, otherwise proceed as expected) or by an external “garbage collecting” daemon. But such an interface with a clean way to enter keys and values already exists and performs well: memcached. So by using Cache::Memcached within the mimedefang-filter mimicking basic milter-ahead behavior (with caching) was done.

But what about the local aliases in the mail server? After all this was all the fuss that prompted the switch anyway. I wrote a Perl script that opened the alias database using the BerkeleyDB package. Two details need caution here:

  • The first one is ignoring the invalid @:@ entry in the alias database. You do not see it in the alias text file, but you will see it when you run praliases. Sendmail uses this entry in order to know whether the database is up-to-date or not. See the bat book for a longer discussion of this.
  • The second detail is that since the alias database is written by a C program, all strings are NULL terminated. This is not the case with strings that are used as keys and values with Perl and the BerkeleyDB package. However the Perl BerkeleyDB package provides for filters to deal with this case. You need something like:
    $db->filter_fetch_key( sub { s/\0$// } );
    

And then there’s the issue of making such a script a daemon. One can go the traditional way, use a daemonizer on steroids or simply use Proc::Daemon::Init and be done with it.

memcached comes handy to storing key-value pairs in many system administration tasks and I think I’m going to use it a lot more in mail filtering stuff.

lost input channel to mta after rcpt

A couple of days ago the enet.gr domain went missing. I observed this because of a call I got from our press office where a user complained that sending mail to journalists was not possible: “I can email all the world, except journalists”. The mail logs showed that:

Feb 17 13:08:26 ns sm-mta[1215]: q1HB5o4Y001215: 
lost input channel from host.name [x.x.x.x] to mta after rcpt

So what was wrong? Because of delays in DNS server responses regarding enet.gr, Thunderbird timed out and dropped the connection (the problem appeared to be Thunderbird specific). My quick hack of the moment because I was on the road was to point enet.gr to 127.0.0.1 in the SMTP server’s /etc/hosts (I was on the road with limited connectivity). A far better solution is to increase the value of mailnews.tcptimeout preference.

Benford’s Law and email subjects

The first book I ever bought from ISACA‘s bookstore, was Nigrini‘s book on Benford’s Law. Briefly stated the law says that in a series of numbers that occur while observing a phenomenon, numbers starting with 1 are more likely to occur than those starting with 2 which in turn are more likely to appear than those that start with 3 and so on up to numbers starting with 9.

P(n) = \log_{10} (1 + \frac{1}{n}), n = 1, ..., 9

The law stands for other bases too.

I’ve had discussions about Benford’s Law applicability on email data over at twitter with Martijn Grooten, but never run any tests. A few hours back I had an interesting discussion with Theodore which reminded me of the law and so I decided to see whether it stands on a number series related to email. The easiest test I could run was on the length of the Subject: lines. Bellow what follows is a graph of Benford’s distribution and actual data from 376916 mails that passed a certain mail server during last week:

Benford's Law vs. length of Subject: lines

It seems that the length of subject lines follow the pattern. For the sake of speed I have omitted from the computation non-latin subject lines, which means that I have to recompute whenever I find a timeslot longer than 15 minutes. But then again if I am to find such a slot, I think I will try to see whether the message body size also follows a Benfordian distribution. It may be more difficult to verify though because of different mail servers imposing different limits on the size of messages sent and received by them. Oh wait, Sotiris just did that! The rest of the tests mentioned in Nigrini’s book are also worth a try.

So what do your logs say about subject lines’ length and Benford’s Law? Do they follow the pattern? I’d be glad to see your answer in the comments section.

PS: I see that there is now a second edition of Nigrini’s book about to be published!

check_compat vs MIMEDefang

We have a user that wishes to have messages sent from sender@host-xyzw.etp.eu.example.com discarded by our mailservers. The natural choice for such blocks seems to be FEATURE(compat_check). In fact we had a number of other users with similar requests that were serviced this way. The problem in this case was that the xyzw part of host-xyzw.etp.eu.example.com was not constant or predictable and finite. Naturally I thought that a local version of the check_compat ruleset would suffice, since $*.eu.etp.example.com matches all possible such hostnames. But it seems that according to the bat book this cannot be done while also using FEATURE(compat_check):

Note that although with V8.12 and later you can still write your own check_compat rule set, doing so has been made unnecessary by the FEATURE(compat_check) (§7.5.7 on page 288). But also note that, as of V8.12, you cannot both declare the FEATURE(compat_check) and use this check_compat rule set.

Since I did not wish to tamper with our sendmail.mc this time, MIMEDefang came to the rescue: filter_relay is called with arguments both the sender and the receiver and that took care of it. But again, had I chosen to write this using sendmail’s language, it might have looked ugly, but it would also have been a one-liner (ugly but elegant in its own way).

arfparse – a simple tool to extract ARF information

arfparse is a utility used to parse mailbox archives and extract ARF information, as described in RFC 5965An Extensible Format for Email Feedback Reports“.

It is meant to work as a preliminary processor, therefore output of the program is kept as simple as possible. Example usage:

$ arfparse -m ~/mail/aol.net

This will extract ARF information sent from scomp@aol.net assuming the FBL reports are archived in ~/mail/aol.net

arfparse is developed on OpenBSD with Panda-IMAP and should work with UW-IMAP too. It is the product of structured procrastination.

You can grab arfparse from GitHub.

Feel free to send me flames, suggestions and improvements.

PS: Yes, I would post about arfparse in the comments section here, but comments seem to be locked for now.

mail hosted at Google, web server elsewhere

This post aims to cover two sets of questions that frequently appear on Serverfault:

“I have the email of my organization hosted at Google and the web server at a hosting provider. When the web server sends email (when a form is completed for example), email is received by everyone except when the recipient is in our domain. Then sendmail tries to deliver locally and not over at Google”. Or, “certain recipients, including Google, reject email from the web server (or servers withing our LAN) as spam”.

There are answers at Serverfault recommending the use of ssmtp in order to forward all sending email via Google, but this requires SMTP authentication and a password saved in a file.

For the purposes of this post the domain example.com will be used.

Configure SPF for example.com

SPF is framework that allows the domain name owners notify the world who they believe the appropriate servers sending mail on behalf of their domain are. Google support pages note that the SPF record should at least be in the form of v=spf1 include:_spf.google.com ~all. However, it is also needed that server.example.org be able to send email on behalf of example.com. So the appropriate record becomes:

v=spf1 a:server.example.org include:_spf.google.com ~all

Note: example.org is not the same domain as example.com

Configure sendmail for server.example.org

example.com is included in /etc/mail/local-host-names, which means that server.example.org treats this a local domain and will try to deliver locally, instead of Google. The following additions to the sendmail configuration file (sendmail.mc) take care of this:

LOCAL_CONFIG
Kbestmx bestmx -T.TMP

LOCAL_RULE_0
R $* < @ example.com. > $*
    $#esmtp $@ [$(bestmx example.com. $)] $: $1 < @ example.com. > $2

The line is broken in two for readability. As always remember that the LHS and the RHS of the rule are separated with tabs and not spaces. So do not copy-paste. Build and install sendmail.cf, restart sendmail and check.

I would welcome additions on how the same can be achieved with postfix or exim.

How to install Zimbra with Operating System in less than an hour ..

After reading my “Installing Exchange 2010 SP1 on a Windows 2008 R2 – A typical installation” post, @nzaharioudakis responded:

Got lost already. #Zimbra would end up in 40-50 min including a fresh OS install. Thnx 4 noticing

I asked Nikos whether he could write up a similar Zimbra guide. And so he did!

How to install Zimbra with Operating System in less than an hour ..

Thank you Nikos.

Installing Exchange 2010 SP1 on a Windows 2008 R2 – A typical installation

The complexity of Exchange makes even the typical setup a long (and laborious if done for the first time) task. But with a little bit of help from “Exchange 2010 – A practical approach” (thank you XLA for this), a bit of guesswork and the installer of Exchange itself these are the steps that worked for me:

  • Install Windows 2008 R2 64bit on the machine. Remember, Exchange 2010 does not run on 32bit.
  • Install all operating system updates.
  • Via the Features wizard add .NET 3.5.
  • Download and install the ASP.NET Ajax extensions.
  • Download and install the Office 2010 Filter Pack.
  • Check Windows Update (again).
  • Although it is recommended that the Exchange Server is not installed on the Domain Controller, this is not a luxury I have in the current setup. Run dcpromo then.
  • Install the following hotfixes from Microsoft:
    • 979099
    • 982867 (Download the Windows 7 64bit version)
    • 979744
    • 983440
    • 977020
  • Check Windows Update (again).
  • The ISO image for the Exchange 2010 SP1 is bigger than the typical DVD disk. WinCDEmu to the rescue. Mount the image and copy its contents to a USB stick. Use this to install Exchange.
  • Prepare the server for a typical setup. Change to the SCRIPTS directory and via the command line issue ServerManagerCmd.exe -InputPath Exchange-Typical.XML.
  • Run SETUP.EXE to start the Exchange installer. If there are any prerequisites missing the installer will inform you about them. You can stop the process, install the missing components and then run SETUP.EXE again. It will give you the option to continue from where you stopped the previous time.
  • I chose to perform a typical install, allowed the installer to automatically install any needed server roles and features and chose not to split the administration groups for Exchange and Active Directory since this was a relatively small installation.
  • Check Windows Update (again).

Thanks to Catastrophic Failure, for a set of notes that I’ve kept from a course she’s given on the subject reduced my installation time.

If you have any questions / suggestions that will help refine this document, please do so in the comments.

An alternate take:

Due to an 8007EE2 windows update error, I performed the following steps:

  • Install Operating System
  • Install ASP.NET Ajax extensions
  • Install the Office 2010 Filter Pack
  • Install Windows 3.5 .NET
  • Install the hotfixes
  • Run DCPROMO
  • Run ServerManagerCmd -InputPath Exchange-Typical.XML
  • Run the Exchange installer
  • Perform updates afterwards. Be careful to include non-operating system updates too.