Cacti Setup for Dell Nodes
The Dell PE1950 and PE2950 nodes have a large number of fans and temperature probes which are not exposed via
SNMP
. The presents a problem for monitoring their status. The
ipmitool
can be used to dump information on these components so we need to provide a way to "pass-thru"
SNMP
requests to
IPMI
or do the equivalent.
Getting the Info
For our Dell nodes the command
ipmitool sdr
will show a lot of information:
root@c-1-29 ~# ipmitool sdr
Temp | -39 degrees C | ok
Temp | -31 degrees C | ok
Temp | 40 degrees C | ok
Temp | 40 degrees C | ok
Ambient Temp | 20 degrees C | ok
CMOS Battery | 0x00 | ok
ROMB Battery | 0x00 | ok
VCORE | 0x01 | ok
VCORE | 0x01 | ok
CPU VTT | 0x01 | ok
1.5V PG | 0x01 | ok
1.8V PG | 0x01 | ok
3.3V PG | 0x01 | ok
5V PG | 0x01 | ok
1.5V PXH PG | 0x01 | ok
5V Riser PG | 0x01 | ok
Backplane PG | 0x01 | ok
Linear PG | 0x01 | ok
0.9V PG | 0x01 | ok
0.9V Over Volt | 0x01 | ok
CPU Power Fault | 0x01 | ok
FAN MOD 1A RPM | 7350 RPM | ok
FAN MOD 1B RPM | 7425 RPM | ok
FAN MOD 1C RPM | 4725 RPM | ok
FAN MOD 1D RPM | 4650 RPM | ok
FAN MOD 2A RPM | 7350 RPM | ok
FAN MOD 2B RPM | 7500 RPM | ok
FAN MOD 2C RPM | 4650 RPM | ok
FAN MOD 2D RPM | 4725 RPM | ok
FAN MOD 3A RPM | 8100 RPM | ok
FAN MOD 3B RPM | 7500 RPM | ok
FAN MOD 3C RPM | 4650 RPM | ok
FAN MOD 3D RPM | 4725 RPM | ok
FAN MOD 4A RPM | 7500 RPM | ok
FAN MOD 4B RPM | 7875 RPM | ok
FAN MOD 4C RPM | 4800 RPM | ok
FAN MOD 4D RPM | 4725 RPM | ok
Presence | 0x01 | ok
Presence | 0x01 | ok
Presence | 0x01 | ok
Presence | 0x02 | ok
Presence | 0x01 | ok
Presence | 0x01 | ok
DRAC5 Conn 2 Cbl | 0x01 | ok
PFault Fail Safe | Not Readable | ns
Status | 0x80 | ok
Status | 0x80 | ok
Status | 0x01 | ok
Status | Not Readable | ns
Status | 0x01 | ok
RAC Status | 0x07 | ok
OS Watchdog | 0x00 | ok
SEL | Not Readable | ns
Intrusion | 0x00 | ok
PS Redundancy | Not Readable | ns
Fan Redundancy | 0x01 | ok
CPU Temp Interf | Not Readable | ns
Drive | 0x01 | ok
Cable SAS A | 0x01 | ok
ECC Corr Err | Not Readable | ns
ECC Uncorr Err | Not Readable | ns
I/O Channel Chk | Not Readable | ns
PCI Parity Err | Not Readable | ns
PCI System Err | Not Readable | ns
SBE Log Disabled | Not Readable | ns
Logging Disabled | Not Readable | ns
Unknown | 0xc0 | ok
CPU Protocol Err | Not Readable | ns
CPU Bus PERR | Not Readable | ns
CPU Init Err | Not Readable | ns
CPU Machine Chk | Not Readable | ns
Memory Spared | 0x00 | ok
Memory Mirrored | 0x01 | ok
Memory RAID | 0x01 | ok
Memory Added | Not Readable | ns
Memory Removed | Not Readable | ns
Memory Cfg Err | 0x01 | ok
Mem Redun Gain | 0x01 | ok
PCIE Fatal Err | 0x01 | ok
Chipset Err | 0x01 | ok
Err Reg Pointer | 0x01 | ok
Mem ECC Warning | 0x01 | ok
Mem CRC Err | 0x01 | ok
USB Over-current | 0x01 | ok
POST Err | Not Readable | ns
Hdwr version err | Not Readable | ns
Mem Overtemp | 0x01 | ok
Mem Fatal SB CRC | 0x01 | ok
Mem Fatal NB CRC | 0x01 | ok
From this list we want to track the fan and temperature information.
Getting IPMI into SNMP
We want the ipmitool info accessible via snmp however the tool takes a while to run:
root@c-3-20 /etc/snmp# time ipmitool sdr >/dev/null
real 0m3.869s
user 0m0.000s
sys 0m0.000s
This can be sped up in two ways. One is by using the
ipmitool sdr dump dell_sdr.txt
command which dumps the sdr info for the local node. This will signifantly speed up processing:
root@c-3-20 /etc/snmp# time ipmitool sdr dump ./dell_sdr.txt
Dumping Sensor Data Repository to './dell_sdr.txt'
real 0m2.760s
user 0m0.000s
sys 0m0.000s
root@c-3-20 /etc/snmp# time ipmitool -S dell_sdr.txt sdr > /dev/null
real 0m2.382s
user 0m0.000s
sys 0m0.000s
We can also use the /dev/shm (shared memory) area to store the output:
root@c-3-20 /etc/snmp# time ipmitool -S dell_sdr.txt sdr > /dev/shm/dell.ipmi
real 0m1.003s
user 0m0.000s
sys 0m0.000s
This is a light enough load to be able to run every minute.
Exposing Dell SDR Info via SNMP
The net-snmp package allows extensions to be added to the snmp host system. This is done by adding a line like:
extend .1.3.6.1.4.1.2021.8.5 1 /bin/cat /dev/shm/dell.ipmi
to the /etc/snmpd.conf file. This line specifies a new OID (.1.3.6.1.4.1.2021.8.5) which is the output of a command.
Since we only want the
sdr
info corresponding to the fans and temps of interest I created a perl script to run the ipmitool command, parse the output and output a single line in a format Cacti will like:
#!/usr/bin/env perl
#
# Uses ipmitool to "dump" Dell sensor info for P1950
#
# Shawn McKee <smckee@umich.edu>
######################################################
$ipmitool = "/usr/bin/ipmitool -S /etc/snmp/sdr.dmp sdr";
if ( ! -e "/etc/snmp/sdr.dmp" ) {
system("/usr/bin/ipmitool sdr dump /etc/snmp/sdr.dmp");
}
# Parse ipmitool output for Dell SDR values
open(CS,"$ipmitool |");
$ntemp=0;
@tempname=("TempCPU1delta","TempCPU2delta","TempChassis1","TempChassis2","TempAmbient","CPUTempInterf");
$nfan=0;
@fanname=("FanMod1A","FanMod1B","FanMod1C","FanMod1D","FanMod2A","FanMod2B","FanMod2C","FanMod2D","FanMod3A","FanMod3B","FanMod3C","FanMod3D","FanMod4A","FanMod4B","FanMod4C","FanMod4D");
while (<CS>) {
# print;
($name,$value,$status)=split(/\|/);
$name =~ s/\s//g;
$value =~ s/\s//g;
$status =~ s/\s//g;
if ($name =~ /FAN/) {
$name=$fanname[$ntemp++];
$value=~/(\d+)/;
$value=$1;
$SDR{$name}=$value;
# print "FAN name=|$name|, value=|$value|, status=|$status|\n";
} elsif ($name =~ /Temp/) {
$name=$tempname[$nfan++];
$value=~/([-+\d]+)/;
$value=$1;
$SDR{$name}=$value;
# print "TEMP name=|$name|, value=|$value|, status=|$status|\n";
} else {
# print "Found name=|$name|, value=|$value|, status=|$status|\n";
}
}
close(CS);
foreach $key (sort keys %SDR) {
$key !~ /Inter/ && print "$key:$SDR{$key} ";
}
print "\n";
This script will automatically make a
sdr.dmp
file the first time it runs.
The output looks like:
root@c-3-20 /etc/snmp# perl dump_dell.pl
FanMod1A:7050 FanMod1B:7050 FanMod1C:4500 FanMod1D:4650 FanMod2A:7275 FanMod2B:7575 FanMod2C:4650 FanMod2D:4650 FanMod3A:7725 FanMod3B:7425 FanMod3C:4800 FanMod3D:4875 FanMod4A:7500 FanMod4B:7725 FanMod4C:4800 FanMod4D:4800 TempAmbient:19 TempCPU1delta:-43 TempCPU2delta:-34 TempChassis1:40 TempChassis2:40
To make this easy to update we create a dell.cron file:
#!/bin/bash
/etc/snmp/dump_dell.pl > /dev/shm/dell.ipmi
This can be added to the 'root' cron to run every minute. The
/dev/shm/dell.ipmi
file wil always contain the most recent measurements and the
snmp
extension uses a lightweight 'cat' to expose the info.
Setup on Dell Nodes
The following should be done on each Dell node:
- Copy the dell.cron, dump_dell.pl files to /etc/snmp/ on the node.
- Edit the '/etc/snmp/snmpd.conf' file and add a line:
-
extend .1.3.6.1.4.1.2021.8.5 1 /bin/cat /dev/shm/dell.ipmi
- Add a 'root' cron entry for
dell.cron
to run every minute.
Testing
To test that things are working try a "remote" snmp command. From another node do:
-
snmpwalk -v 2c -c usatlasgrid .1.3.6.1.4.1.2021.8.5
You should get something like:
[umopt1:~]# snmpwalk -v 2c -c usatlasgrid c-3-20 .1.3.6.1.4.1.2021.8.5
UCD-SNMP-MIB::extTable.5.1.0 = INTEGER: 1
UCD-SNMP-MIB::extTable.5.2.1.2.1.49 = STRING: "/bin/cat"
UCD-SNMP-MIB::extTable.5.2.1.3.1.49 = STRING: "/dev/shm/dell.ipmi"
UCD-SNMP-MIB::extTable.5.2.1.4.1.49 = ""
UCD-SNMP-MIB::extTable.5.2.1.5.1.49 = INTEGER: 5
UCD-SNMP-MIB::extTable.5.2.1.6.1.49 = INTEGER: 1
UCD-SNMP-MIB::extTable.5.2.1.7.1.49 = INTEGER: 1
UCD-SNMP-MIB::extTable.5.2.1.20.1.49 = INTEGER: 4
UCD-SNMP-MIB::extTable.5.2.1.21.1.49 = INTEGER: 1
UCD-SNMP-MIB::extTable.5.3.1.1.1.49 = STRING: "FanMod1A:7125 FanMod1B:7050 FanMod1C:4500 FanMod1D:4650 FanMod2A:7275 FanMod2B:7575 FanMod2C:4650 FanMod2D:4650 FanMod3A:7725 FanMod3B:7425 FanMod3C:4875 FanMod3D:4875 FanMod4A:7500 FanMod4B:7725 FanMod4C:4875 FanMod4D:4875 TempAmbient:19 TempCPU1delta:-42 TempCPU2delta:-34 TempChassis1:40 TempChassis2:40 "
UCD-SNMP-MIB::extTable.5.3.1.2.1.49 = STRING: "FanMod1A:7125 FanMod1B:7050 FanMod1C:4500 FanMod1D:4650 FanMod2A:7275 FanMod2B:7575 FanMod2C:4650 FanMod2D:4650 FanMod3A:7725 FanMod3B:7425 FanMod3C:4875 FanMod3D:4875 FanMod4A:7500 FanMod4B:7725 FanMod4C:4875 FanMod4D:4875 TempAmbient:19 TempCPU1delta:-42 TempCPU2delta:-34 TempChassis1:40 TempChassis2:40 "
UCD-SNMP-MIB::extTable.5.3.1.3.1.49 = INTEGER: 1
UCD-SNMP-MIB::extTable.5.3.1.4.1.49 = INTEGER: 0
UCD-SNMP-MIB::extTable.5.4.1.2.1.49.1 = STRING: "FanMod1A:7125 FanMod1B:7050 FanMod1C:4500 FanMod1D:4650 FanMod2A:7275 FanMod2B:7575 FanMod2C:4650 FanMod2D:4650 FanMod3A:7725 FanMod3B:7425 FanMod3C:4875 FanMod3D:4875 FanMod4A:7500 FanMod4B:7725 FanMod4C:4875 FanMod4D:4875 TempAmbient:19 TempCPU1delta:-42 TempCPU2delta:-34 TempChassis1:40 TempChassis2:40 "
Interpreting the Values
There are 5 temps reported:
root@c-1-29 ~# ipmitool sdr
Temp | -39 degrees C | ok
Temp | -31 degrees C | ok
Temp | 40 degrees C | ok
Temp | 40 degrees C | ok
Ambient Temp | 20 degrees C | ok
The first two are CPU temps reported from sensors on the CPU. They are reported relative to the CPU critical temperature. The 3rd and 4th values (40 degree C) seem to be unused? The Ambient Temp seems to be a chassis temp near the front of the chassis (20C = 68F), is this beleivable?
http://lists.us.dell.com/pipermail/linux-poweredge/2007-July/032172.html
--
ShawnMcKee - 24 Sep 2007