Graphing the average http request execution time with SNMP

March 2, 2012 under Main

After implementing our new NAS based shared storage for our virtualization pool, we wanted to see what the real world impact on web server performance would be. We decided that the amount of time Apache takes to process http requests would give us a good indication as to whether performance had improved or degraded after moving the VM from local RAID storage to the NAS.

Apache already knows how long it takes to serve each request, but by default this information is not logged anywhere. Since the metric is already available, adding some logging and statistics collection is a nice way to graph this metric over time with minimal impact on server performance. Unlike measuring a page load time externally, this measurement does not involve any additional requests to the server and takes its data from the whole server, not just one test page.

The first step is to set up a new CustomLog on your server to log the %D variable for each request, see mod_log_config in the Apache docs. %D is “The time taken to serve the request, in microseconds.”

LogFormat "%D" resptime
CustomLog logs/resptime_log resptime

If you have multiple virtual hosts with per-vhost logging (ie. you have a CustomLog inside each virtual host), you will have to add this directive to each virtual host that you want to collect samples from. You can have multiple virtual hosts log to the same file, just beware that you may run into file descriptor limits depending on the number of virtual hosts and your OS. When we tried to add this directive to >500 virtual servers we hit this limit and got an “Unable to open log files” error and Apache refused to start. We solved the issue by sampling a selection of virtual hosts (~200) instead of all of them.

The next step is to get these values into a usable average that you can graph. Since we already use Cacti for graphing things like CPU, load average and network throughput, we decided to try to graph this value the same way. Cacti collects its statistics via SNMP using the net-snmp package on CentOS. This package can be extended to collect additional information, see Extending Net-SNMP on docs.redhat.com

I wrote a small Perl script to average the values in the response time log and clobber the file when done, as the data is of little interest once it has been averaged and logged. I saved the script as /usr/local/bin/avgresptime.pl


#!/usr/bin/perl

# Path to the log
my $log = “/etc/httpd/logs/resptime_log”;

# Open the log file
open(LOGFILE, $log) or die “Could not open log file: $!\n”;

# How many lines in the log? (= number of requests)
my $lines = `wc -l $log | cut -d ” ” -f 1`;
chomp($lines);

# Define $total and $avg here
my $total = 0;
my $avg = 0;

# Add up the total number of microseconds:
while() {
$total += $_;
}

# Calculate the average
$avg = int (($total / $lines) + .5);

# Print (debug)
### human-readable output:
#print sprintf(“Log file has %d lines with an average response time of %d \n”, $lines, $avg);
### lines + avg
#print sprintf(“%d %d”, $lines, $avg);
### only avg resp time:
#print sprintf(“%d”, $avg);

# Close log file
close(LOGFILE);

# Clobber the file
open(LOGFILE, “>”, $log) or die “Could not clobber log file: $! \n”;
close(LOGFILE);

# convert from microseconds to hundredths of seconds and round to nearest integer
my $out = int (($avg / 10000) + .5);

# the reason for this is:
# “Integer exit codes are limited to a range of 0-255. For values that are likely to exceed 256, either use the standard output of the script (which will be typed as a string) or a different method of extending the agent.”
# source: http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Deployment_Guide/sect-System_Monitoring_Tools-Net-SNMP-Extending.html
#
# using an exit code is by far the simplest way to graph this data. 255 hundredths of a second = 2.55 seconds.
# hopefully no web server takes more than 2.55 seconds on average to process requests.
# after 12 hours graphing, our average on a busy web server is 19, max 136.
if( $out < 255) { exit $out; } else { exit 255; }

I then added a line to my /etc/snmp/snmpd.conf :

extend httpd_avg_exec_time /usr/local/bin/avgresptime.pl

I then spent a frustrating hour or so trying to figure out how to address the OID to get the value out of SNMP. Turned it to be:

# snmpwalk -v 1 -c -On .1.3.6.1.4.1.8072.1.3.2.3.1.4

Which returned:

.1.3.6.1.4.1.8072.1.3.2.3.1.4.19.104.116.116.112.100.95.97.118.103.95.101.120.101.99.95.116.105.109.101 = INTEGER: 14

That big long string is the OID you want Cacti to graph. To do that, go to your device (assuming you already have it set up in Cacti) and under “Associated Graph Templates” add “SNMP – Generic OID Template”. I gave mine a nice descriptive title and for the vertical axis, put “hundredths of a second” as the label. You then have to go to the Data Source and enter the OID. You’ll also want to set “Maximum Value” to 255 (being the maximum value an exit code can return) and set the data source type to Gauge.

Let it churn for a couple of polls (10-15 minutes) and you should start to see data!

Here’s what our graph looks like after running overnight:

average httpd request processing time

Now we wait. Once we’ve got a good week or two of sample data, we’ll migrate this virtual machine to our NAS storage and see what impact that has on request processing time.

Lasso Sites Manager module for Webmin

March 4, 2010 under Main
Create Site

Screenshot of Create Site window

This week I wrote a module for Webmin called Lasso Sites Manager.

Webmin is a really cool browser based server administration tool. It comes with dozens of modules for managing everything from user accounts to server software, package updates and disk quotas.

Lasso Server 9 is the latest release of the Lasso middleware application from LassoSoft. Lasso 9 lacks a key feature which the last release, 8.5, did have: an easy, graphical way to define multiple Lasso Sites.

Lasso 9 does support this feature, but with a very different implementation from Lasso 8. Instead of spawning its own site processes, Lasso 9 now integrates into Apache via FastCGI. FastCGI can now be configured to connect to a specific Lasso installation. Thus, the sites are much better integrated with Apache and the OS, which yields much more flexibility.

Inspired by discussions on the Lasso Talk mailing list, I decided to have a go at writing a Webmin module to manage multiple Lasso Sites in Lasso Server 9. I’m releasing the module here as free software with no license restrictions.

Feedback, improvements, and a Mac port would all be very welcome. If there is enough interest from other developers in maintaining and improving the module, I’ll probably put it up somewhere like Sourceforge or Google Code.

Lasso Sites

Lasso 9 supports multiple installations of Lasso Server on a single host. Each installation (“Site”) requires a unique installation directory and FastCGI port number. The lassoserver process needs to be told where its home is by being passed the LASSO9_HOME environment variable at startup.

Multiple Apache virtual hosts can connect to a single Lasso Site, or you can define 1 Lasso Site per virtual host.

Each Lasso Site can run under a different Unix user, providing much increased security, especially when implemented together with Apache’s suEXEC module. Lasso Sites Manager webmin module supports suEXEC and will even apply the correct Apache configuration and file permissions for you.

Module features in brief:

  • Create, update and delete Lasso Sites
  • Install from local copy of Lasso Server, a pre-configured template, or from SVN
  • Auto-selection of available FastCGI port
  • Configure Lasso Site to auto-start Lasso Server via Apache FastCGIServer
  • Or manually control stop/start of Lasso Server via the module
  • Automatic integration with Apache virtual hosts
  • Ability to run Lasso sites under any Unix user (also without suEXEC, by setting setuid bit)
  • Enable/disable Lasso Server error log (LASSO9_PRINT_FAILURES directed to a file)
  • Integration with suEXEC (off by default, enable via module config)
  • Open Lasso Admin for any Site directly from the module

Some currently known issues are:

  • Only tested on CentOS Linux 5.4 – anyone interested in contributing Mac OS X support please contact me
  • Auto-start not working in combination with suEXEC
  • May have issues when turning suEXEC support on or off after you have already defined sites. Saving each site config (just click on the name and hit Save) should fix this
  • Install from SVN at your own risk, may require some manual tweaking

An issue to be aware of that isn’t a bug: when using suEXEC on CentOS, the Lasso Site _must_ be located within /var/www. This path is hard-coded into the suEXEC module as the doc-root. If you want to change this, you need to recompile FastCGI with a different AP_DOC_ROOT.

On to To Do list:

  • Ability to update one/all sites from updated local installation or from SVN
  • Port to Mac OS X client and server. Anyone who can contribute to this port, or provide access to an OS X Server installation, please contact Chris Wik.

Download and Installation

Pre-requisites: a working installation of CentOS 5 with Webmin, Apache 2.2, Lasso Server 9, and subversion (‘yum -y install subversion’ on CentOS). Built/tested with Webmin 1.500.

Please install and test that Lasso Server 9 works BEFORE installing this module. Eg: turn off selinux, install Lasso Server RPM, install FastCGI (using the provided script at /usr/local/lib/lasso/Apache2Conf/install_mod_fastcgi)

You can download the module here. To install, go to Webmin -> Webmin Configuration -> Webmin Modules. Select “From uploaded file”, choose the file, and click Install Module. The module will now appear under the Servers category in Webmin.

This module comes with no guarantees, use it at your own risk.

comments: Closed tags: , , , ,
Subscribe