http://www.developer.com/

Back to article

A Picture is Worth a Thousand Lines of Code...


August 25, 2005

Why RRD?

There are some awesome perl graphics libraries out there, both general and specific. The reason I've chosen to introduce you to the wonderfully reversed world of RRDtool (RRD standing for Round Robin Database) is that it works well for producing line graphs. I like line graphs because they help a network and/or system administrator get a good look at trends in their network and/or systems. By identifying trends, it becomes much easier to identify anomalies and quickly identify the existence of a problem.

RRDtool spun off of the MRTG (Multi-Router Traffic Grapher) project. As MRTG matured, the programmers modularized the backend of the data storage and display system into its own library. Thinking that the rest of the world might be able to put that library to good use, they (I mean mostly Toebias) spun off RRDtool and spent time documenting and supporting it and its often fledgling developers. MRTG has continued to be a driving force behind the development of RRDtool, but since its release and adoption by many open source developers, there have been many more people fanning the flames.

RRDtool is a powerful data storage and visualization tool, and I highly recommend that the reader visits the RRDtool home page to explore the vast possibilities.

RRD Setup

Designing the database

As with all programming projects, most of the real "hard" work should be done during the design process. To get clear, concise, and representative pictures of what you're monitoring, it's critical to analyze the data and the process you'll use to access and interpret the data.

  • What data points are critical?
  • How much detail is critical?
  • How often do I collect data?
  • Can my devices handle the data polling procedure?
  • How often am I going to view the graphs?
  • How much historical data do I need?

Answering all of these questions will help you begin to lay out the database. Usually, network graphs look at five-minute intervals, and graph those intervals over the course or 1 day, 2 days, 1 week, 1 month, and 1 year. Using those time frames is a good starting point, but maybe under- or overkill in your particular application.

Creating RRDs using RRDs.pm

I cover how to create, update, and graph data using the RRDs.pm module distributed with the RRDtool distribution. There are perl modules that might be simpler, but because RRDs.pm uses similar syntax to the RRDtool command line tools, and forces the programmer to understand the concepts behind the tool, I feel it'll be worth the price of admission three times over.

Sample RRD Options for the RRDs.pm create() function:

my @opts = (
   '--step', 300,
   'DS:in:GAUGE:600:0:U',
   'DS:out:GAUGE:600:0:U',
   'RRA:AVERAGE:0.5:1:288',
   'RRA:AVERAGE:0.5:2:288',
   'RRA:AVERAGE:0.5:7:288',
   'RRA:AVERAGE:0.5:30:288',
   'RRA:AVERAGE:0.5:356:288'
);

RRDs are compromised of three basic elements, Step, Data Sources, and Archives. The step determines the interval at which data can be entered in the database. The Data Source describes the data. The Archive stores the data.

Step

my @opts = (
   '--step', 300,
   'DS:in:GAUGE:600:0:U',
   'DS:out:GAUGE:600:0:U',
   'RRA:AVERAGE:0.5:1:288',
   'RRA:AVERAGE:0.5:2:288',
   'RRA:AVERAGE:0.5:7:288',
   'RRA:AVERAGE:0.5:30:288',
   'RRA:AVERAGE:0.5:356:288'
);

The first option specified to the RRD is the step. This number is represented in seconds and tells the RRD how often it will accept updates. The default step is 300 seconds (five minutes). If you attempt to update an RRD multiple times within your step, RRDtool will issue a warning stating that data for that time period has already been received, and it will disregard the new value. If you want to run the poller process every minute, use a step of 60 seconds. This value will be consistent for all data sources you identify in the same RRD.

Data Sources

my @opts = (
   '--step', 300,
   'DS:in:GAUGE:600:0:U',
   'DS:out:GAUGE:600:0:U',
   'RRA:AVERAGE:0.5:1:288',
   'RRA:AVERAGE:0.5:2:288',
   'RRA:AVERAGE:0.5:7:288',
   'RRA:AVERAGE:0.5:30:288',
   'RRA:AVERAGE:0.5:356:288'
);

Data sources are also known as Primary Data Points (PDPs). PDPs are the actual data that your poller program will send the RRD to store. The options that follow tell the RRD how to interpret your data. The designation of "DS" tells the RRDs::create() function that this option is a data source. There are six key options that follow, colon delimited to provide the translation: Name, Type, Heartbeat, Minimum Value, and Maximum Value.

Name

The first argument to a data source element is the name of the data source. This name will be used in all other RRD operations and calculations to distinguish the data source. Typical network interfaces usually have two to three data sources, traffic in, out, and errors. This particular example just sets up the data source for in and out.

Type

RRDtool has a robust set of data types, for the sake of this article, I'll cover the two simplest and most used; GAUGE and COUNTER. A GAUGE is used for something like temperature, humidity, or throughput. It's like a reading from your speedometer or tachometer. It can fluctuate up and down and should be evaluated as to its value. A COUNTER is for things like "packets received" or "bytes received" on SNMP or Kernel tables for individual interfaces, or the number of visits to a website. These numbers are constantly incrementing. For the sake of simplicity, you'll use a GAUGE in this example. A neat example of a GAUGE graph might be "forecasted" vs. "actual" temperatures for your area.

Heartbeat

The heartbeat is simply the number of seconds between updates before a value is determined to be "unknown." Unknown values in the graph are represented by gaps in the data. Unknown data will be represented internally as "Not a Number"—NaN. In this example, you'll assume that if you don't get information for 600 seconds (10 minutes), there's something wrong and the RRD should flag this value as NaN.

Minimum and Maximum Value

Using Minimum and Maximum values helps the RRD allocate memory for data storage. Using common sense to say "for network traffic, you should never represent less than 0 bytes transmitted in 300 seconds" lets you set the minimum at "0". If you know the linespeed, and the unit you can calculate the maximum value, or if you're using this for multiple interfaces that all have different linespeeds or linespeeds that might change you can use a "U" to represent "Unknown". Used as a maximum, it represents positive infinity; as a minimum, the "U" represents negative infinity. Use the constraints that best suit your data. Values outside the min and max are regarded as "unknown" and effectively disregarded.

Archives

Now that you've described the data, you need to tell RRDtool how to store it. This is where you want to break out a calculator. If you've chosen to do 1 day, 2 days, 1 week, 1 month, and 1 year graphs, you'll have to store the data in archives that make displaying data in those formats sensible. You don't want blocky graphs; you want enough data points to give you good detail, but you don't necessarily want to store too much data.

my @opts = (
   '--step', 300,
   'DS:in:GAUGE:600:0:U',
   'DS:out:GAUGE:600:0:U',
   'RRA:AVERAGE:0.5:1:288',
   'RRA:AVERAGE:0.5:2:288',
   'RRA:AVERAGE:0.5:7:288',
   'RRA:AVERAGE:0.5:30:288',
   'RRA:AVERAGE:0.5:356:288'
);

Archive definitions begin with an "RRA" and contain four critical pieces of information, all colon delimited: Type, "Xfiles Factor," Steps, and Rows.

Types

Archive types are different than Data Source types; there are quite a few simple and complex types. For the purposes of this article, you'll just take a look at the simple types: AVERAGE, MIN, MAX, and LAST. If you've worked with SQL before, you may notice a striking resemblance of these types to SQL Aggregate functions. That's because you're aggregating data from your Data Sources into RRAs or "Round Robin Archives."

"Xfiles Factor"

This is a number between 0 and 1 that tells RRDtool what percent of the aggregated data can be made up of "Unknown" while still remaining valid. I've used a factor of 50%, to say at least half the consolidated data must be valid. You can adjust this number higher or lower to eliminate gaps in aggregated data sets.

Steps

This number tells RRDtool how many steps (in this example, 300 second intervals) to consolidate for each Archive. As a simplified example, I'm using the number of days as the number of steps. You'll need to adjust this for making graphs of different time frames.

Rows

To get the number of rows, I divided the number of seconds in a day (86400) by the step (300), and got a result of 288. The number of rows tells RRDtool how many rows of aggregated data to store. Using the number of days as the step, I can keep 288 rows of data and cover the graphs I want to do.

There are other ways to adjust the steps and rows in the RRA to limit the number of RRAs that you have to create, but that can take up more space than you might need. If you take time to design the database, and meet the majority of your needs, you can limit the size of the RRD itself, thus making the access times a little bit quicker.

Code for Creation

The @opts array you've been working with is a list of arguments you'd pass to the RRDs::create() function to build an RRD on the local disk. Here's a sample snippet that I use to create my RRDs dynamically.

my $RRD_DIR = '/var/rrd/myapp';
sub create_rrd {
   my $hostname = shift;
   $hostname =~ s/[^wd_-.]+//g;  # taint check hostname if coming
                                   # from an untrusted source.
   my $fn = join '/', $RRD_DIR, $hostname;
   return $fn if -f $fn;
   my @opts = (
      '--step', 300,
      'DS:in:GAUGE:600:0:U',
      'DS:out:GAUGE:600:0:U',
      'RRA:AVERAGE:0.5:1:288',
      'RRA:AVERAGE:0.5:2:288',
      'RRA:AVERAGE:0.5:7:288',
      'RRA:AVERAGE:0.5:30:288',
      'RRA:AVERAGE:0.5:356:288'
   );
   RRDs::create $fn, @opts;
   #
   # Check for error
   my $err = RRDs::error;
   warn "Error creating $fn: $errn" if $err; 
      #
      # return the filename
      return $fn;
}

Adding Data to the RRDs

Now that the RRDs are set up, you need to fill it with your data. I usually use RRDs; it involves networking, and so the data is relevant to IP addresses or something of that nature. To facilitate speedy coding, I usually write a data updating function that looks like this:

sub update_rrd {
   my $host = shift;
   my $RRD = create_rrd($host);
   my @vals = @_;
   my $update = join ':', time,@vals;
   RRDs::update $RRD, $update;
   my $err = RRDs::error;
   warn "update_rrd: problem updating $RRD: $err"
      if $err;
}

And to add data, you call the update_rrd() function tgat will automatically create the RRD for you if one doesn't exist, thanks to the interaction of the the create_rrd() function and update_rrd(). You pass in the following arguments: the hostname, the in traffic, and the out traffic. If you had your data stored in a hash, it might look like this:

foreach my $host (keys %traffic) {
   update_rrd($host,$traffic{$host}{in},$traffic{$host}{out});
}

Drawing Pretty Graphs

With your data securely stored in the RRDs, you can draw pretty pictures. I use two functions to draw the graphs. The first actually calls the RRDs::graph() function with the colors and layout. The second calls the first function to draw the graphs for each host and time period.

Setting up the graphs

After spending time setting up the databases, it's always rewarding to spend some time laying out the graph's visual elements. A detailed explanation of the options to pass to the graph function are covered in the excellent documentation accompanying the RRDtool distribution. See the man page for the RRDgraph tool for details.

my $IMGDIR = '/var/rrd/myapp/img';
sub graph {
   my ($host,$type,$start,$rrd) = @_;
   my @opts = (
      '--color', 'BACK#CCCCCC',      # Background Color
      '--color', 'SHADEA#FFFFFF',    # Left and Top Border Color
      '--color', 'CANVAS#000000',    # Canvas (Grid Background)
      '--color', 'GRID#333333',      # Grid Line Color
      '--color', 'MGRID#CCCCCC',     # Major Grid Line Color
      '--color', 'FONT#000000',      # Font Color
      '--color', 'ARROW#FF0000',     # Arrow Color for X/Y Axis
      '--color', 'FRAME#000000',     # Canvas Frame Color
      #
      # Set the labels
      '--title', "traffic for $host [$type]",    # Top
      '--vertical-label', 'bytes',               # Y-Axis Label
      #
      # Tell the graphing function how far back to go
      '--start', $start,
      '--step', 150,
      #
      # Extract data from the RRD
      "DEF:in=$rrd:in:AVERAGE",
      "DEF:out=$rrd:out:AVERAGE",
      "CDEF:rin=in,-1,*",
      "AREA:out#6666ff:outbound",
      "AREA:rin#99ccff:inbound",
      "HRULE:0#0000FF"
   );
   my $image = "$IMGDIR/$host-$type.gif";
   RRDs::graph $image, @opts;
   my $err = RRDs::error;
   warn "graphing $host ($type) failed: $errn" if $err;
}

An interesting problem that MRTG solves by using a combination of "AREA" and "LINE" plots occurs when two data points utilizing the "AREA" plotting end up swapping places. In other words, when the foreground "AREA" ends up with data points that trounce the background "AREA," it becomes impossible to read that background "AREA." A long time ago, I saw another excellent solution to the problem in a University born, Netflow graphing utility. By inflecting the "inbound" traffic under the X-Axis, you can use two "AREA" plots without worrying about overlap with some very interesting visual effects to boot!

How do you accomplish that? Well, obviously, you multiply the inbound traffic by -1. But what about all the data that's been collecting in your RRD with POSITIVE values for inbound traffic? Relax; RRDtool can use "CDEF"s to manipulate the data using "Reverse Polish Notation"—RPN—so you don't have to modify anything!

The "DEF" lines in the RRDgraph options allow you to pull data out of the RRD and use an aggregation function on those data points. This is used when you attempt to draw a graph where 1 pixel needs to represent more than 1 finite data point in the RRD. To modify the data, you have to extract it using a "DEF" line.

"DEF:in=$rrd:in:AVERAGE"

What this is doing is creating a variable called "in" that equals the average value of the "in" DS (data source) in the RRD "$rrd" (this variable is interpolated by perl to the filename). You do the same thing for the outbound traffic measurement. To inflect the in data point over the X-Axis, you have to multiply its value by -1. This is easily accomplished by using the "CDEF" and some RPN.

RPN

RPN works like a stack. You move from left to right, you get two values, then an operator, and that becomes a single value. Here's an example:

1,2,+,4,* = 12
Step By Step
Read Value Operation Stack After Operation
1 put 1 onto the stack 1
2 put 2 onto the stack 2
1
+ Operator. Add the values on the stack 3
4 put 4 onto the stack 4
3
* Operator. Multiply the values on the stack 12

It's actually rather simple after you train your brain to slow down. So, you use a "CDEF" and some RPN to inflect the inbound traffic on the X-Axis and in so doing, create a new variable "rin". You'll use the "rin" variable later in your "AREA" definition as the source of its values.

"CDEF:rin=in,-1,*"

You use your variable "in" that you defined as coming from the the "in" data source for a particular RRD. Then, you multiply its value at any given point by -1 by pushing a -1 onto the stack, immediately followed by a '*'.

The only thing left to decide is the type of graph element you want to use. For two simple data points using the inflection method for one of the values, it's fairly straightforward and visually appealing to use "AREA" definitions. RRDtool provides a host of other graph elements and excellent documentation on them. You may have to play with the aggregate functions for the "DEF"s and the order of the graph element declarations to get easy-to-read, non-overlapping visual data representations. Tuning the graphs is half the fun!

Once you're done, you've got the function set up to draw all your graphs!

Drawing the Graphs

The following function figures out how many seconds back each one of your time intervals is, and sends that and a few other key pieces of information to the graph() function.

my $VARDIR = '/var/rrd/myapp';
sub draw_graphs {
   #
   # draws all the pretty graphs
   opendir(LS, $VARDIR) or
      die "couldn't open $VARDIR: $!n";
   my %IMAGES = ();
   my %STARTS = (
      day => time   - (3600*24),
      week => time  - (3600*24*7),
      month => time - (3600*24*30)
   );
   while(local $_ = readdir LS) {
      chomp;
      next unless /(.+).rrd$/;
      my $host = $1;
      my $rrd = "$VARDIR/$_";
      foreach my $type (keys %STARTS) {
         graph($host,$type,$STARTS{$type},$rrd);
      }
   }
   closedir LS;
}

No real magic there. This function gets a directory listing of the RRD's directory and calls the graph() function for all the files in that directory that end in '.rrd'.

Pretty Pictures!

Day

day graph

Week

week graph

Month

month graph

Where to Go from Here ...

I've attached the following archive that contains the code that I wrote some time ago that served as inspiration to this article. I'm not offering support for the code, but I thought it would assist the reader in getting a clearer picture of how a number of my favorite CPAN modules can work together. The code presented in the article is straight out of the code in the attached archive.

A picture is worth a thousand words. Thanks to perl and RRDtool, a picture could be worth quite a lot more.

About the Author

Brad Lhotsky is a Software Developer whose focus is primarily web based application in Perl and PHP. He has over 5 years experience developing systems for end users and system and network administrators. Brad has been active on Perl beginner's mailing lists and forums for years, attempting to give something back to the community.

Sitemap | Contact Us

Thanks for your registration, follow us on our social networks to keep up-to-date