Postfix monitoring with Cacti, my way – part 2

In this second post, we’ll see how to collect event statistics from Postfix (deliveries, rejection, and similar) and publish them in snmp to our monitoring tools, specifically but not limited to Cacti.

There’s no magic here: Postfix logs details about it’s activity to syslog, and there is there we are going to search our events. The first, straightforward solution would be to schedule a task which scan the Postfix log files (on CentOS servers, here particularly CentOS 6, usually it goes to /var/log/maillog and is weekly rotated by default). But I would like a somewhat more optimized, possibly somewhat event-driven solution, which does not scan again and again the same text log files.

Here in CentOS6 our syslog daemon is rsyslog, which is able to pass specific log lines to an external helper program. Moreover, this helper program can be spawned just once by rsyslogd at startup, and be passed log lines when available in a loop. This seems exactly what we’re looking for. So the picture will be:

  • Postfix logs something to syslog;
  • The log line is passed to our helper;
  • The helper scans it and decides if we’re interested. If yes, it increments a counter somewhere.
  • On snmpd side, an extension script much like the one discussed in Part 1 will read the counters, and publish them in the MIB.

Where do we place the counters? There are dozen possible solutions. I already have a running mysql instance on my servers, so I’m going to use a mysql table. You could easily use sqlite or another DB, or something different like memcached, shared memory, regular files (but think about locking) etc…

Now on. The more important part of this post is the script that classifies the Postfix log lines. Because there are so many different kinds of messages we can be potentially interested in, writing the rules to recognize them can be a quite tedious task. Fortunately for me, I found a script by Glen Pitt-Pladdy which accomplishes exactly this task, for his log file analysis software named Universal Log Analyzer. You can read about it here . Thank you Glen. So I adapted his script. For the curious, on the Glen page you can download a tar.bz2 archive containing a file called postfix.pm. That’s what I adapted for my purpose. I intentionally tried to keep my changes as few as possibile, to ease keeping up to date with the upstream tool. For this reason I left around some variables which are not unused in my script.

Here it is: postfixRSyslogCollect . Remove the final “.txt” after downloading. Place it in /usr/local/sbin. Update $dbuser, $dbpass, $database and other variables as needed in your environment.

Create the mysql table like this (I named it pfxlogsums, you can change it):

CREATE TABLE `pfxlogsums` (
  `tipo` varchar(255) NOT NULL,
  `num` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`tipo`)
);

Maybe it’s time to make some tests. First, let’s try to handle some log lines, without actually writing anything in the db:

< /var/log/maillog postfixRSyslogCollect -vr

should produce something like:

Incrementing postfix:smtp:connect by 1
Incrementing postfix:smtpd:connect:ipv4 by 1
Incrementing postfix:smtpd:connect by 1
Incrementing postfix:smtpd:warning by 1
Incrementing postfix:smtp:deferred by 1
Incrementing postfix:smtp:deferred:dnserror by 1
Incrementing postfix:smtp:connect by 1
Incrementing postfix:smtpd:warning by 1
Incrementing postfix:smtpd:NOQUEUE:reject:Recipient by 1
Incrementing postfix:smtpd:NOQUEUE by 1
Incrementing postfix:smtpd:NOQUEUE:reject:Recipient:other by 1
Incrementing postfix:smtpd:NOQUEUE:reject by 1
Incrementing postfix:smtpd:lostconnection by 1
Incrementing postfix:smtpd:warning by 1
Incrementing postfix:smtp:deferred by 1
Incrementing postfix:smtp:deferred:dnserror by 1
Incrementing postfix:smtp:connect by 1
Incrementing postfix:smtp:deferred by 1
Incrementing postfix:smtp:deferred:dnserror by 1

Now, let’s try to write in the db:

< /var/log/maillog postfixRSyslogCollect

no errors should be returned, and the pfxlogsums table should now be populated with some rows. You can kkep them, as they are the statistics for the events already logged by Postfix, or you can empty the table.

Now let’s link the script in rsyslogd: create the file /etc/rsyslog.d/pfxlogsumm.conf with the following content:

$ModLoad omprog                      # Enables filter programs to futher analyze logs
$ActionOMProgBinary                  /usr/local/sbin/postfixRSyslogCollect
:programname, startswith, "postfix"  :omprog:;RSYSLOG_TraditionalFileFormat

Restart rsyslogd, and you should see the pfxlogsums rows grow as there is mail activity.

The next part should be trivial if you read Postfix monitoring with Cacti, my way – part 1 . Place the following in /usr/local/sbin/snmpPostfixRSyslogQuery :

#!/usr/bin/perl
use strict;
use DBI;
use SNMP::Extension::PassPersist;
use Getopt::Std;

my $hostname = 'localhost';
my $dbuser = 'postfix';
my $dbpass = 'password';
my $database = 'postfix';
my $port = '3306';
my $table = 'pfxlogsums';

my $baseoid = '.1.3.6.1.4.1.2021.70';

my $dsn = "DBI:mysql:database=$database;host=$hostname;port=$port";
my $dbh = DBI->connect($dsn, $dbuser, $dbpass);
my $sth = $dbh->prepare("SELECT tipo, num FROM $table");

my $extsnmp;
my %opts;
getopts('d', \%opts);

if( $opts{d} ) {
    $extsnmp = undef;
    update_tree();
} else {
    $extsnmp = SNMP::Extension::PassPersist->new(
        backend_collect => \&update_tree,
        idle_count      => 60,      # no more than 60 idle cycles
        refresh         => 50,     # refresh every 245 sec
    );
    $extsnmp->run;
}

sub addOidEntry($$$) {
    my ($dove, $tipo, $cosa) = @_;
    if( $opts{d} ) {
        print "$dove = $tipo: $cosa\n";
    } else {
        $extsnmp->add_oid_entry($dove, $tipo, $cosa);
    }
}

sub update_tree {
    $sth->execute() or do {
        warn "Non riesco ad eseguire la query";
        return;
    };

    my $hr = $sth->fetchall_hashref('tipo');
    #use Data::Dumper; print STDERR Dumper($hr); exit;

    my $idx = 0;

    foreach my $k ( sort keys %$hr ) {
        my $v = defined $hr->{$k} ? $hr->{$k}->{'num'} : 0;
        $idx++;
        #print "$k $idx $v", "\n";
        addOidEntry("${baseoid}.1.${idx}", 'integer', $idx);
        addOidEntry("${baseoid}.2.${idx}", 'string', $k);
        addOidEntry("${baseoid}.3.${idx}", 'counter', $v);
    }

    addOidEntry("${baseoid}.0", 'integer', $idx);
}

Then in /etc/snmp/snmpd.conf:

pass_persist .1.3.6.1.4.1.2021.70 /usr/bin/perl /usr/local/sbin/snmpPostfixRSyslogQuery

reload snmpd, ad try it out:

# snmpwalk -v1 -cpublic -On localhost .1.3.6.1.4.1.2021.70
.1.3.6.1.4.1.2021.70.0 = INTEGER: 53
.1.3.6.1.4.1.2021.70.1.1 = INTEGER: 1
.1.3.6.1.4.1.2021.70.1.2 = INTEGER: 2
.1.3.6.1.4.1.2021.70.1.3 = INTEGER: 3
.1.3.6.1.4.1.2021.70.1.4 = INTEGER: 4
...
.1.3.6.1.4.1.2021.70.2.1 = STRING: "postfix:lmtp:bounced"
.1.3.6.1.4.1.2021.70.2.2 = STRING: "postfix:lmtp:deferred"
.1.3.6.1.4.1.2021.70.2.3 = STRING: "postfix:lmtp:other"
.1.3.6.1.4.1.2021.70.2.4 = STRING: "postfix:lmtp:sent"
...
.1.3.6.1.4.1.2021.70.3.1 = Counter32: 19
.1.3.6.1.4.1.2021.70.3.2 = Counter32: 17213
.1.3.6.1.4.1.2021.70.3.3 = Counter32: 13
.1.3.6.1.4.1.2021.70.3.4 = Counter32: 5570497
...

And even for this time, we’re done. The next part should be graphing the data in Cacti. Unfortunately summer holidays are running out, so maybe there will be some time to wait… Anyhow, the task should be trivial enough: make use of Data Queries, since all out data is indexed. There are plenty of examples around. See you enxt time.

Comments are closed.