September 16, 2014
Hot Topics:
RSS RSS feed Download our iPhone app

A Picture is Worth a Thousand Lines of Code...

  • August 25, 2005
  • By Brad Lhotsky
  • Send Email »
  • More Articles »

Why RRD?

There are some awesome perl graphics libraries out there, both general and specific. The reason I've chosen to introduce you to the wonderfully reversed world of RRDtool (RRD standing for Round Robin Database) is that it works well for producing line graphs. I like line graphs because they help a network and/or system administrator get a good look at trends in their network and/or systems. By identifying trends, it becomes much easier to identify anomalies and quickly identify the existence of a problem.

RRDtool spun off of the MRTG (Multi-Router Traffic Grapher) project. As MRTG matured, the programmers modularized the backend of the data storage and display system into its own library. Thinking that the rest of the world might be able to put that library to good use, they (I mean mostly Toebias) spun off RRDtool and spent time documenting and supporting it and its often fledgling developers. MRTG has continued to be a driving force behind the development of RRDtool, but since its release and adoption by many open source developers, there have been many more people fanning the flames.

RRDtool is a powerful data storage and visualization tool, and I highly recommend that the reader visits the RRDtool home page to explore the vast possibilities.

RRD Setup

Designing the database

As with all programming projects, most of the real "hard" work should be done during the design process. To get clear, concise, and representative pictures of what you're monitoring, it's critical to analyze the data and the process you'll use to access and interpret the data.

  • What data points are critical?
  • How much detail is critical?
  • How often do I collect data?
  • Can my devices handle the data polling procedure?
  • How often am I going to view the graphs?
  • How much historical data do I need?

Answering all of these questions will help you begin to lay out the database. Usually, network graphs look at five-minute intervals, and graph those intervals over the course or 1 day, 2 days, 1 week, 1 month, and 1 year. Using those time frames is a good starting point, but maybe under- or overkill in your particular application.

Creating RRDs using RRDs.pm

I cover how to create, update, and graph data using the RRDs.pm module distributed with the RRDtool distribution. There are perl modules that might be simpler, but because RRDs.pm uses similar syntax to the RRDtool command line tools, and forces the programmer to understand the concepts behind the tool, I feel it'll be worth the price of admission three times over.

Sample RRD Options for the RRDs.pm create() function:

my @opts = (
   '--step', 300,
   'DS:in:GAUGE:600:0:U',
   'DS:out:GAUGE:600:0:U',
   'RRA:AVERAGE:0.5:1:288',
   'RRA:AVERAGE:0.5:2:288',
   'RRA:AVERAGE:0.5:7:288',
   'RRA:AVERAGE:0.5:30:288',
   'RRA:AVERAGE:0.5:356:288'
);

RRDs are compromised of three basic elements, Step, Data Sources, and Archives. The step determines the interval at which data can be entered in the database. The Data Source describes the data. The Archive stores the data.

Step

my @opts = (
   '--step', 300,
   'DS:in:GAUGE:600:0:U',
   'DS:out:GAUGE:600:0:U',
   'RRA:AVERAGE:0.5:1:288',
   'RRA:AVERAGE:0.5:2:288',
   'RRA:AVERAGE:0.5:7:288',
   'RRA:AVERAGE:0.5:30:288',
   'RRA:AVERAGE:0.5:356:288'
);

The first option specified to the RRD is the step. This number is represented in seconds and tells the RRD how often it will accept updates. The default step is 300 seconds (five minutes). If you attempt to update an RRD multiple times within your step, RRDtool will issue a warning stating that data for that time period has already been received, and it will disregard the new value. If you want to run the poller process every minute, use a step of 60 seconds. This value will be consistent for all data sources you identify in the same RRD.

Data Sources

my @opts = (
   '--step', 300,
   'DS:in:GAUGE:600:0:U',
   'DS:out:GAUGE:600:0:U',
   'RRA:AVERAGE:0.5:1:288',
   'RRA:AVERAGE:0.5:2:288',
   'RRA:AVERAGE:0.5:7:288',
   'RRA:AVERAGE:0.5:30:288',
   'RRA:AVERAGE:0.5:356:288'
);

Data sources are also known as Primary Data Points (PDPs). PDPs are the actual data that your poller program will send the RRD to store. The options that follow tell the RRD how to interpret your data. The designation of "DS" tells the RRDs::create() function that this option is a data source. There are six key options that follow, colon delimited to provide the translation: Name, Type, Heartbeat, Minimum Value, and Maximum Value.

Name

The first argument to a data source element is the name of the data source. This name will be used in all other RRD operations and calculations to distinguish the data source. Typical network interfaces usually have two to three data sources, traffic in, out, and errors. This particular example just sets up the data source for in and out.

Type

RRDtool has a robust set of data types, for the sake of this article, I'll cover the two simplest and most used; GAUGE and COUNTER. A GAUGE is used for something like temperature, humidity, or throughput. It's like a reading from your speedometer or tachometer. It can fluctuate up and down and should be evaluated as to its value. A COUNTER is for things like "packets received" or "bytes received" on SNMP or Kernel tables for individual interfaces, or the number of visits to a website. These numbers are constantly incrementing. For the sake of simplicity, you'll use a GAUGE in this example. A neat example of a GAUGE graph might be "forecasted" vs. "actual" temperatures for your area.

Heartbeat

The heartbeat is simply the number of seconds between updates before a value is determined to be "unknown." Unknown values in the graph are represented by gaps in the data. Unknown data will be represented internally as "Not a Number"—NaN. In this example, you'll assume that if you don't get information for 600 seconds (10 minutes), there's something wrong and the RRD should flag this value as NaN.

Minimum and Maximum Value

Using Minimum and Maximum values helps the RRD allocate memory for data storage. Using common sense to say "for network traffic, you should never represent less than 0 bytes transmitted in 300 seconds" lets you set the minimum at "0". If you know the linespeed, and the unit you can calculate the maximum value, or if you're using this for multiple interfaces that all have different linespeeds or linespeeds that might change you can use a "U" to represent "Unknown". Used as a maximum, it represents positive infinity; as a minimum, the "U" represents negative infinity. Use the constraints that best suit your data. Values outside the min and max are regarded as "unknown" and effectively disregarded.





Page 1 of 4



Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Sitemap | Contact Us

Rocket Fuel