lunes, 6 de agosto de 2007

Solaris 10 + Sun Cluster 3.2 + ZFS + QMail (HA - Failover)

Trying to improve our mail system, I decided to change our previous architecture to an edge technology like ZFS + Sun Cluster.

Our previous mail server had a Linux Debian + Qmail LDAP + OpenLDAP + SpamAssassin filters + Antivirus and so on with an internal storage. The performance was quite good but there are some missing features in OpenLDAP and the standard Linux file systems.

Using my experience with Solaris (right now I feel more comfortable in Solaris than in Linux) I proposed a significant change:

Two nodes running Solaris 10, with Sun Cluster 3.2 (quorum device in an external storage... may be some day I'll try the quorum server), a pool of ZFS for the mail spool (with compression enabled as the traffic is not too heavy), SunOne Directory Server 5 (I'd prefer 6, but I'm not the LDAP specialist so...) and QMail with LDAP enabled (with the standard options... SpamAssassin, DKIM, qmail-scanner-st with several antivirus, RBL, etc..). The courier imap is also part of the architecture.

Everything working on HA (ZFS does not support Scalable services yet).

Why QMail?
I love this MTA (if it's patched with LDAP support). It's a little bit tricky to meet all the requirements in Solaris 10 (compiled with Sun Studio, but gcc is also fine) but in my opinion is the best talking about performance.

Solaris 10 (11/06): it offers many good things (SMF, ZFS, DTrace and so on...). I'm a Solaris instructor so...

Sun Cluster 3.2: Support for ZFS... that's enough for me. We have an external storage so there's no need to use the quorum server feature.
I developed a specific QMail data service agent that uses the daemon tools but I guess that with the Generic Data Service Agent is enough.

ZFS: With an external storage (and old one but good enough) I built a Raidz pool with a dataset (compression enabled) for the mail spool. The compression algorithm is fast, but the compress ratio is not like gzip or bzip. Our spool was 20GB in the old system and after the upgrade is 17GB. Yes, in a production environment ;). I wrote a script to manage the snapshot feature (I don't think it's really usefull in this case, but it doesn't disturb). Our mail traffic is not really heavy, so the compression and the snapshots are not a performance problem.

Sun Directory Server 5: The multimaster feature makes this product better for this cluster environment than openldap. I'm not a LDAP specialist but I love version 6... it works quite well with WebConsole (I know... it takes the ram as if it was free...).

The "bad" thing of this is the big amount of packets you need to install before QMail works, but after that you have a fast system delivering mails, with the flexibility of an MTA that rocks. The integration with SpamAssassin and the very good qmail-scanner-st tool is awesome.

Here is the result (sorry... still using the old cli of Sun Cluster):

root@tupolev:/> scstat -g

-- Resource Groups and Resources --

Group Name Resources
---------- ---------
Resources: haqmail-rg haqmail-ds qmail qmail-ds


-- Resource Groups --

Group Name Node Name State Suspended
---------- --------- ----- ---------
Group: haqmail-rg tupolev Online No
Group: haqmail-rg metcha Offline No


-- Resources --

Resource Name Node Name State Status Message
------------- --------- ----- --------------
Resource: haqmail-ds tupolev Online Online
Resource: haqmail-ds metcha Offline Offline

Resource: qmail tupolev Online Online - LogicalHostname online.
Resource: qmail metcha Offline Offline

Resource: qmail-ds tupolev Online Online - Service is online.
Resource: qmail-ds metcha Offline Offline


When it's clean enough, I'll post the Data Service Agent for QMail.