(Slightly OT) Analysing Web Logs

March 31, 2011 under The 100% Uptime Challenge

A side effect of running multiple Web servers, whether behind a load balancer or using the multiple A records technique I’ve been discussing, is the problem of analysing your log files. I really like awstats, a free log file analyser. It parses your Web server logs to extrapolate some useful visitor statistics and displays the info in a pretty report format.

Generating individual reports for each Web server wouldn’t be very helpful, as each would only represent a portion of your overall traffic. I have therefore picked one server on which to generate my reports, and instructed the logrotate service to automatically copy archived web server logs over to it:

# Edit the logrotate config file for Apache:
[cwik@nl ~]$ sudo vi /etc/logrotate.d/httpd

# I set mine to rotate daily (add 'daily'), and added the scp command to the postrotate
# section so the archived file gets copied over to the IE server:
/var/log/httpd/*log {
    daily
    missingok
    notifempty
    sharedscripts
    postrotate
        /sbin/service httpd reload > /dev/null 2>/dev/null || true
        scp -Bq /var/log/httpd/access_log.1 ie:/var/log/httpd/nl/
    endscript
}

I did this on the us.cwik.ch and nl.cwik.ch servers. On the ie.cwik.ch server, I created some new directories where the logs will be copied to, and to make things simple later on, I also set up logrotate to copy archived logs to the /var/log/httpd/ie/ directory:

[cwik@ie ~]$ sudo mkdir /var/log/httpd/ie /var/log/httpd/nl /var/log/httpd/us
[cwik@ie ~]$ sudo vi /etc/logrotate.d/httpd
/var/log/httpd/*log {
    daily
    missingok
    notifempty
    sharedscripts
    postrotate
        /sbin/service httpd reload > /dev/null 2>/dev/null || true
        cp /var/log/httpd/access_log.1 /var/log/httpd/ie/
    endscript
}

The next step is to set up awstats on the Irish server (which I have nominated as the stats server):

[cwik@ie ~]$ sudo yum -y install awstats
[cwik@ie ~]$ cd /etc/awstats
[cwik@ie awstats]$ sudo cp awstats.model.conf awstats.ie.cwik.ch.conf
[cwik@ie awstats]$ sudo vi awstats.ie.cwik.ch.conf

For the most part the sample config file has sensible defaults. There are lots of things you might want to customise, but the most important for this setup are the LogFile and SiteDomain directives:

LogFile="/usr/share/awstats/tools/logresolvemerge.pl /var/log/httpd/*/access_log.1 |"
SiteDomain="www.cwik.ch"

The LogFile directive uses the logresolvemerge.pl script, included with awstats, to merge the 3 access logs and sort them by time. Note the pipe after the path to the logs, this instructs awstats that the LogFile is actually a command from which it should read stdout. I’ve also enabled DNS lookups (DNSLookup=1) and turned on the year view option (AllowFullYearView=3).

The last step is to set up some access controls so only authenticated users can access the stats reports (unless you don’t mind having the info publicly visible):

[cwik@ie ~]$ sudo vi /etc/httpd/conf.d/awstats.conf
# I removed the default access control and put in password based authentication:
<Directory "/usr/share/awstats/wwwroot">
    Options None
    AllowOverride None
    AuthType Basic
    AuthName awstats
    AuthUserFile /var/www/.htpasswd
    Require valid-user
</Directory>
:wq

# Create a htpasswd file, and reload the apache config:
[cwik@ie ~]$ htpasswd -c /var/www/.htpasswd my_username
(enter a password, twice)
[cwik@ie ~]$ sudo /sbin/service httpd reload

And that’s all there is to it! My combined stats are now automatically updated every day and are accessible (with authentication) at http://ie.cwik.ch/awstats/awstats.pl?config=ie.cwik.ch

Subscribe