Using Logstash to log SMTP/email bounces like a boss

I’ve recently worked on a customized emailing suite for a client that involves bulk email (shutter) and thought I’d do a write up on a few things that I thought were slick.

Originally we decided to use AWS SES but were quickly kicked off of the service because my client doesn’t have clean enough email lists (a story for another day on that).

The requirement from the email suite was pretty much the same as what you’d expect from SendGrid except the solution was a bit more catered toward my client. Dare I say I came up with an AWESOME way of dealing with creating templates? Trade secret.

Anyways – when the dust settled after the initial trials of the system and we were without a way to deliver bulk emails and track the SMTP/email bounces. After scouring for pre-canned solutions there wasn’t a whole lot to pick from. There were some convoluted modifications for postfix and other knick knacks that didn’t lend well to tracking the bounces effectively (or implementable in a sane amount of time).

 

Getting at the bounces…

At this point in the game, knowing what bounced can come from only one place: the maillog from postfix. Postfix is kind enough to record the Delivery Status Notification (‘DSN’) in an easily parsable way.

 

Pairing a bounce with database entries…

The application architecture called for very atomic analytics. So every email that’s sent is recorded in a database with a generated ID. This ID links click events and open reports on a per-email basis (run of the mill email tracking).  To make the log entries sent from the application distinguishable, I changed the message-id header to the generated SHA1 ID – this lets me pick out which message ID’s are from the application, and which ones are from other sources in the log.

There’s one big problem though:

Postfix uses it’s own queue ID’s to track emails – this is that first 10 hex digits of the log entry. So we have to perform a lookup as such:

  1. Deterimine the queue ID from the application generated message-id log entry
  2. Find the queue ID log entry that contains the DSN

 

(Those of you know where this is going – skip this next paragraph).

This is a problem because we can’t do two at the same time. The time when a DSN comes in is variable – this would lead to a LOT of grepping and file processing – one to get the queue ID. Another to find the DSN – if it’s been recorded yet.

 

The plan…

The plan is as follows with the above in mind:

  • Continuously trail the log file
  • Any entries matching message-id=<hash regex> – RECORD: messageId, queueId to a database table(‘holding tank’).
  • Any entries with a DSN – RECORD: queueId, dsn to holding tank table
  • Periodically run a script that will:
    • Process all rows that have a DSN and messageId that is NOT NULL. (This will give us all messages that were sent from our application that have DSN’s recording)
    • Delete all rows that do NOT match that criteria AND are older than 30 days old (they’re a lost cause and hogging up space in the table).

Our database table looks like this:

Making the plan stupid simple, enter: Logstash

We use Logstash where I work for log delivery from our web applications. In my experience with Logstash I learned that it is a tool with so much potential for what I was looking for. Progressive tailing of logs and so many built in inputs, outputs and filters for this kind of work it was a no brainer.

So I set up Logstash and three stupid-simple scripts to implement the plan.

Hopefully this is self explanatory to what the scripts take for input – and where they’re putting that input (see holding tank table above)

Setting up logstash, logical flow:

Logstash has a handy notion of ‘tags’ – where you can have the system’s elements (input, output, filter) enact on data fragments tagged when they match a criteria.

So I created the following configuration logic:

Input:
/path/to/maillog

Tags:

  • bounce – matches log entries containing a status of ‘bounced’ – (not interested in success or defer)
  • send – matches log enries matching the message-id regex for the application generated ID format

Output:

  • EXEC: LogDSN.php – for ‘bounce’ tags only
  • EXEC: LogOutbound.php – for ‘send’ tags only 

 

Full config file (it’s up to you to look at the logstash docs to determine what directives like grep, kv and grok are doing)

 

Then the third component – the result recorder runs on a cron schedule and simply queries records where the message ID and DSN are not null. The mere presence of the message ID indicates the email was sent from the application. the DSN means there’s something to enact on by the result recorder script.

 

Simple overview

* The way I implemented this would change depending on the scale – but for this particular environment, executing scripts instead of putting things into an AMQP channel or the likes was acceptable. 

Leave a Reply

Your email address will not be published. Required fields are marked *