(Slightly OT) Analysing Web Logs

March 31, 2011 under The 100% Uptime Challenge

A side effect of running multiple Web servers, whether behind a load balancer or using the multiple A records technique I’ve been discussing, is the problem of analysing your log files. I really like awstats, a free log file analyser. It parses your Web server logs to extrapolate some useful visitor statistics and displays the info in a pretty report format.

Generating individual reports for each Web server wouldn’t be very helpful, as each would only represent a portion of your overall traffic. I have therefore picked one server on which to generate my reports, and instructed the logrotate service to automatically copy archived web server logs over to it:

# Edit the logrotate config file for Apache:
[cwik@nl ~]$ sudo vi /etc/logrotate.d/httpd

# I set mine to rotate daily (add 'daily'), and added the scp command to the postrotate
# section so the archived file gets copied over to the IE server:
/var/log/httpd/*log {
        /sbin/service httpd reload > /dev/null 2>/dev/null || true
        scp -Bq /var/log/httpd/access_log.1 ie:/var/log/httpd/nl/

I did this on the us.cwik.ch and nl.cwik.ch servers. On the ie.cwik.ch server, I created some new directories where the logs will be copied to, and to make things simple later on, I also set up logrotate to copy archived logs to the /var/log/httpd/ie/ directory:

[cwik@ie ~]$ sudo mkdir /var/log/httpd/ie /var/log/httpd/nl /var/log/httpd/us
[cwik@ie ~]$ sudo vi /etc/logrotate.d/httpd
/var/log/httpd/*log {
        /sbin/service httpd reload > /dev/null 2>/dev/null || true
        cp /var/log/httpd/access_log.1 /var/log/httpd/ie/

The next step is to set up awstats on the Irish server (which I have nominated as the stats server):

[cwik@ie ~]$ sudo yum -y install awstats
[cwik@ie ~]$ cd /etc/awstats
[cwik@ie awstats]$ sudo cp awstats.model.conf awstats.ie.cwik.ch.conf
[cwik@ie awstats]$ sudo vi awstats.ie.cwik.ch.conf

For the most part the sample config file has sensible defaults. There are lots of things you might want to customise, but the most important for this setup are the LogFile and SiteDomain directives:

LogFile="/usr/share/awstats/tools/logresolvemerge.pl /var/log/httpd/*/access_log.1 |"

The LogFile directive uses the logresolvemerge.pl script, included with awstats, to merge the 3 access logs and sort them by time. Note the pipe after the path to the logs, this instructs awstats that the LogFile is actually a command from which it should read stdout. I’ve also enabled DNS lookups (DNSLookup=1) and turned on the year view option (AllowFullYearView=3).

The last step is to set up some access controls so only authenticated users can access the stats reports (unless you don’t mind having the info publicly visible):

[cwik@ie ~]$ sudo vi /etc/httpd/conf.d/awstats.conf
# I removed the default access control and put in password based authentication:
<Directory "/usr/share/awstats/wwwroot">
    Options None
    AllowOverride None
    AuthType Basic
    AuthName awstats
    AuthUserFile /var/www/.htpasswd
    Require valid-user

# Create a htpasswd file, and reload the apache config:
[cwik@ie ~]$ htpasswd -c /var/www/.htpasswd my_username
(enter a password, twice)
[cwik@ie ~]$ sudo /sbin/service httpd reload

And that’s all there is to it! My combined stats are now automatically updated every day and are accessible (with authentication) at http://ie.cwik.ch/awstats/awstats.pl?config=ie.cwik.ch

Multi-Master Replication Verification And Conflict Resolution With Maatkit

March 29, 2011 under The 100% Uptime Challenge

I mentioned briefly in Setting Up Multi-Master Circular Replication with MySQL how a particular situation could potentially arise in which data becomes inconsistent across the MySQL masters, but replication does not fail. The two scenarios where this could happen are near-simultaneous updates to the same record on different servers, or the same record being updated on two servers while replication is down (eg. due to a link failure, where we get a “split brain” scenario)

One way to avoid this kind of problem is to design your application such that records are never updated. Instead, create a brand new record for every update. With the help of a timestamp column, you can then always select the newest version of the record by date. Removing the Update_priv from your MySQL user’s access privileges will ensure your app never UPDATE’s a record, even unintentionally.

In many cases the DBA will not have any control over the queries sent to the database server. This is my case with the www.cwik.ch blog, which runs on WordPress and does occasionally use UPDATE queries. While data inconsistency should be extremely rare under normal operation, it would definitely be a good idea to be able to verify data integrity after an incident of replication failure (for example because the link between two of the servers went down temporarily)

Thankfully this problem has already been tackled by folks more knowledgeable on the topic than myself and a great toolset released under the name maatkit. On CentOS 5 you can install maatkit from yum, simply ‘yum install maatkit’. I’ve created a simple demonstration to illustrate how it works:

# Create a table to store checksums in:
mysql ie> CREATE DATABASE maatkit;
mysql ie> USE maatkit
mysql ie> CREATE TABLE checksum (
    db         char(64)     NOT NULL,
    tbl        char(64)     NOT NULL,
    chunk      int          NOT NULL,
    boundaries char(100)    NOT NULL,
    this_crc   char(40)     NOT NULL,
    this_cnt   int          NOT NULL,
    master_crc char(40)         NULL,
    master_cnt int              NULL,
    ts         timestamp    NOT NULL,
    PRIMARY KEY (db, tbl, chunk)

# Create a test table and populate it with some test values:
mysql ie> USE cwik
mysql ie> CREATE TABLE test (
    val VARCHAR(255) NOT NULL
mysql ie> INSERT INTO test (val) VALUES ('I am Ireland');
mysql nl> INSERT INTO test (val) VALUES ('I am the Netherlands');
mysql us> INSERT INTO test (val) VALUES ('I am the USA');
mysql ie> SELECT * FROM test;
| id | val                  |
|  1 | I am Ireland         |
|  2 | I am the Netherlands |
|  3 | I am the USA         |

# Stop the slaves temporarily. For this test just IE and NL.
mysql ie> SLAVE STOP;
mysql nl> SLAVE STOP;

# Update the same record on both servers
mysql nl> UPDATE test SET val='Ich bin Niederlande' WHERE id=2;
mysql ie> UPDATE test SET val='Ik ben Nederland' WHERE id=2;

# Re-start the slaves
mysql nl> SLAVE START;
mysql ie> SLAVE START;

# Check the test table contents
mysql ie> SELECT * FROM test;
| id | val                 |
|  1 | I am Ireland        |
|  2 | Ich bin Niederlande |
|  3 | I am the USA        |

mysql nl> SELECT * FROM test;
| id | val              |
|  1 | I am Ireland     |
|  2 | Ik ben Nederland |
|  3 | I am the USA     |

Note that IE and NL have replicated each other’s updates, the result is the value of record #2 is switched! Now I can use the maatkit tools to flag this inconsistency, and then fix it:

# Use mk-table-checksum to checksum all the tables in the cwik database.
# You'll see a list of tables and checksums
[cwik@ie ~]$ mk-table-checksum -u root -pmy_supersecret_pw --algorithm BIT_XOR \
  --databases cwik --emptyrepltbl --replicate maatkit.checksum localhost

# Use mk-table-sync with --print to see the conflicts that have arisen
[cwik@ie ~]$ mk-table-sync --print --synctomaster --replicate maatkit.checksum \
REPLACE INTO `cwik`.`test`(`id`, `val`) VALUES (2, 'Ich bin Niederlande');
# Cool, note that cwik.test.id=2 is out of sync

# Update nl.cwik.ch with the corrected record:
[cwik@ie ~]$ mk-table-sync --execute --synctomaster --replicate maatkit.checksum \

Re-checking the test table on both IE and NL shows we now have consistent data!!

mysql ie> SELECT * FROM test;
| id | val                 |
|  1 | I am Ireland        |
|  2 | Ich bin Niederlande |
|  3 | I am the USA        |
mysql nl> SELECT * FROM test;
| id | val                 |
|  1 | I am Ireland        |
|  2 | Ich bin Niederlande |
|  3 | I am the USA        |

I’ve just touched the surface of a very complex topic here. More reading on maatkit and all it can do is definitely warranted.

One more thing: running such a complex database system is far from risk free, so to further mitigate the risk of data loss or corruption, I set up regular MySQL backups on all 3 of my servers. It’s pretty simple to do with mysqldump and cron:

mysql ie> GRANT SELECT, LOCK TABLES ON cwik.* TO 'backup'@'localhost' \
  IDENTIFIED BY 'my_supersecret_pw';

[cwik@ie ~]$ sudo mkdir /backups
[cwik@ie ~]$ chown cwik:cwik /backups
[cwik@ie ~]$ crontab -e
0 3 * * * mysqldump -u backup -pmy_supersecret_pw --single-transaction \
  cwik | gzip > /backups/`date +\%d`.sql.gz

I installed this cron job on all 3 servers, creating a daily snapshot of my ‘cwik’ database which I can use to restore a consistent copy should I mess anything up terribly. Combined with the binary log (which is enabled already as we’re using replication) I feel confident that I could never lose any important data. Recovering from an inconsistent setup could well be complex, but with the proper backups and logs, possible.

comments: 1 » tags: ,

BIND DNS Server And Zone File Configuration

March 26, 2011 under The 100% Uptime Challenge

BIND is the standard for DNS services on Unix operating systems. It is somewhat complex and includes many useful features, but for this configuration I’m going to keep it really simple.

I already installed BIND using yum earlier on. On CentOS it’s installed from the ‘bind’ package, but the service is (somewhat illogically) called ‘named’. If you’re also going to use the server for local DNS resolution (by putting in my /etc/resolv.conf) you’ll need a copy of the root server cache file, so BIND can look up external names. You’ll also need this if any records in your DNS zone refer to an external DNS name, for example a CNAME record pointing to another domain. Otherwise BIND will not be able to recurse and the lookup will fail.

So to get started:

# enable the service
chkconfig named on

# save the DNS root server info to /etc/db.cache:
wget -O /etc/db.cache http://www.internic.net/zones/named.cache

# create a config file for named:
# see my config file for an example
# at a minimum, edit the zone entry to match your domain
vi /etc/named.conf

# create the DNS zone file
# see my zone file for an example
# edit to match your domain and server IPs
vi /var/named/data/cwik.ch.hosts

# start the server
service named start

My sample config files are a very simple, minimal configuration. You may want to elaborate further. For further reading, check out the BIND documentation and/or the excellent DNS and BIND from O’Reilly.

In my setup, I have configured the 3 DNS servers in master/master/master configuration, because I wanted full control of the zone file on each server to be able to test different configurations. I also want to experiment later on with putting some intelligent health checks into the DNS system, whereby each server will check the availability of the other servers, and only return the IPs of those servers it can reach. The hope is that using this technique I can prevent the IP of an unreachable server being returned to the client, thus further increasing uptime.

Back to the BIND configuration, a little explanation on my config files:

DNS recursion: I’ve only listed the loopback interface under allow-recursion{}┬áin the options{} block of named.conf, so only my local system can issue recursive queries. You may want to add the IPs of each of your servers so they can use each other as backup resolvers. In my case I have used the DNS resolvers provided by my upstream providers as secondaries in my /etc/resolv.conf file.

Zone configuration: I have instructed each server to be a master for my zone file. I will therefore have to propagate updates to my zone file to all my DNS servers manually. Another (more usual) way to configure BIND is to nominate one server as your primary and instruct the other servers to slave your zone. In this configuration, zone updates are propagated automatically, but you lose the ability to modify the zone file on each resolver independently, which is why I have used 3 masters.

Zone file: My zone file is for the most part pretty standard: a default TTL, an SOA record, NS and A records for each of my nameservers and an MX record. I’ve also set up an SPF record to help ensure my mail gets through the more aggressive spam filters.

The interesting part is the configuration for www.cwik.ch, this blog. I’ve created 3 A records at the root of the domain, ie. http://cwik.ch/ – one for each of my servers. I’ve then created www.cwik.ch as a CNAME (Canonical NAME – ie. an alias) of cwik.ch, so any DNS query for www.cwik.ch will return the 3 IPs from the root of the domain. This is the foundation on which this whole 100% uptime project is based.

A test query shows it is working perfectly:

$ host -t A www.cwik.ch
www.cwik.ch has address
www.cwik.ch has address
www.cwik.ch has address

On the left hand side of this page, under MySQL replication status, you should find a small indicator showing which server you are currently connected to. It’ll be one of the above IPs, depending on which one your browser picked.

The glue: Once I had all 3 nameservers set up correctly, I went to SWITCH, the Swiss domain registry, and changed the listed nameservers for cwik.ch to my new setup. The process to do this varies by the registry you use, but all registries offer the facility. There is an extra step involved which is not necessary if you are using someone else’s nameservers: setting up the glue record. This is the mapping of your DNS server names to their IPs, which must be done statically at your registrar. In my case I could do it via a simple web interface as depicted here:

DNS glue

Each of the 3 servers need to be defined and the IP hard-coded. This IP is stored in the .ch root so other resolvers know how to recurse to the cwik.ch domain. If I ever change the IP of one of my servers, I’ll also need to update it at the registry.

Propagating zone updates: I’ve already explained why I chose not to configure a master/slave setup, choosing instead to have 3 masters. However this means I need to update the zone file manually on all 3 servers in order to make any changes. That’s not very convenient, so I wrote a small shell script to do it for me. Feel free to copy/use it if you find it’s useful.

comments: 0 » tags: , , , , ,

Configuring Apache

March 23, 2011 under The 100% Uptime Challenge

In my scenario, the Apache server doesn’t need to be aware that it is only one server of many. Therefore I’ve customised my configuration file and copied the exact same file to all 3 servers. How much you customise is down to taste and requirements, but at a minimum I would suggest disabling any modules that you don’t need, configuring a ServerAdmin address and setting up logging to suit your needs.

I’ve posted my somewhat stripped down httpd.conf should you wish to use it as a starting point (it’s for Apache 2.2 on CentOS 5).

In addition to the customisations to httpd.conf, take a look in /etc/httpd/conf.d/ to see what other files are being loaded. I removed the welcome.conf file, but didn’t make any further changes. Most Apache modules will install a config file here, for example mod_ssl will create an ssl.conf file here.

Finally, enable the apache service:

chkconfig httpd on
service httpd start