Ganglia
See:
MSU Plan
ROCKS 4
Wish to separate machines at MSU into different groups by function.
Group |
IP |
Description |
AGLT2 at MSU |
225.31.5.18 |
This is the default from ROCKS |
MSU T2 |
239.2.11.61 |
Tier2 compute nodes at MSU |
MSU T2 Storage |
239.2.11.63 |
Tier2 storage nodes at MSU |
MSU T3 |
239.2.11.67 |
Tier3 compute nodes at MSU |
MSU OSG |
239.2.11.65 |
DZero OSG compute node at MSU |
MSU Server |
239.2.11.69 |
Servers at MSU that don't fall in other catagories |
A gmetad is running on msurox, the groups don't run their own gmetad just gmonds and have the setting "deaf = no" so that they listen to (and repeat) their peers.
Gmetad on msurox needs to query gmonds that are in each group to fetch data. The gmond on msurox is not in all the groups... This also allows us to specify multiple gmonds per group --- useful if a node is down.
=== root@msurox /home/install/extras/ganglia > more /etc/gmetad.conf
# gmetad.conf for msurox
# our grid
gridname "USATLAS"
# remote clusters
data_source "AGLT2 at UM" 10.10.1.3:8649
data_source "UM ATLAS Servers" 10.10.1.8:8649
data_source "UM ATLAS Storage" 10.10.1.27:8649 10.10.1.28:8649
# local clusters
#no nodes should be in the default (ROCKS) group AGLT2 at MSU
#data_source "AGLT2 at MSU" localhost:8649
data_source "MSU OSG Compute" cc-112-32.local:8649 cc-113-33:8649
data_source "MSU T2 Compute" cc-102-1.local:8649 cc-104-1.local:8649 cc-106-1.local:8649
data_source "MSU T2 Storage" msufs01.local:8649 msufs02.local:8649
data_source "MSU T3 Compute" cc-113-1.local:8649 cc-113-2.local:8649
data_source "MSU Server" msu2.local:8649
ROCKS 5
Still not ideal
Really want to have multiple clusters. There needs to be one (or more) gmonds per cluster that aggregate info (they need to not be "deaf") and get queried by gmetad. It seems that a single gmond can only agrigate the information for one cluster --- the on it is in. If other nodes talk on the same multicast channel or do unicast to the gmond, their info will be included, even if they are nominally in different clusters.
ROCKS5 has put unicast sending within the cluster in the default config.
So, need multiple listening gmonds. Can place them on different nodes, one per cluster, or could perhaps place them all on the frontend, but give them different multicast addresses to use and different ports to use for unicast. For now, am using the first option, see table below.
Gmond
Will have ROCKS 4 frontend running plus msurx (production) and msurxi (test) ROCKS 5 frontends. Plan is that the same group names will be used in ROCKS 5, but the groups will be in different (new) multicast ranges. The MSU T3 and MSU OSG groups will be combined to just MSU T3.
Group |
Multicat IP |
Unicast IP |
gmond.conf filename |
Description |
AGLT2 at MSU ROCKS 5 |
224.0.0.4 |
10.10.128.11 |
None, from ROCKS |
This is the default from ROCKS on msurx |
AGLT2 at MSU ROCKS 5 Test |
224.0.0.4 |
10.10.128.12 |
None, from ROCKS |
This is the default from ROCKS on msurxi |
MSU T2 |
239.2.12.61 |
cc-117-1.msulocal |
gmond.conf-msut2 |
Tier2 compute nodes at MSU |
MSU T2 dCache Pool |
239.2.12.63 |
msufs01.msulocal |
gmond.conf-msut2pool |
Tier2 dCache pool nodes at MSU |
MSU T3 |
239.2.12.67 |
cc-115-1.msulocal |
gmond.conf-msut3 |
Tier3 compute nodes at MSU |
MSU Server |
239.2.12.69 |
msurx.msulocal |
gmond.conf-msuserv |
Servers at MSU that don't fall in other catagories |
MSU Test |
239.2.12.71 |
msurxi.msulocal |
gmond.conf-test |
Systems installed from msurxi |
The ROCKS 5 implementation of above sets an attribute in the database named "gmond_conf" that hows the gmond.conf filename.
Gmetad on msurox needs to query gmonds that are in each group to fetch data. The gmond on msurox is not in all the groups... This also allows us to specify multiple gmonds per group --- useful if a node is down.
# gmetad.conf for msurx
# our grid
gridname "USATLAS"
# remote clusters
data_source "AGLT2 at UM" 10.10.1.42:8649
data_source "UM ATLAS Servers" 10.10.1.8:8649
data_source "UM ATLAS Storage" 10.10.1.27:8649 10.10.1.28:8649
# local clusters
#no nodes should be in the default (ROCKS) group AGLT2 at MSU
#data_source "AGLT2 at MSU" localhost:8649
data_source "MSU T2 Compute" cc-102-1.local:8649 cc-104-1.local:8649 cc-106-1.local:8649
data_source "MSU T2 Storage" msufs01.local:8649 msufs02.local:8649
data_source "MSU T3 Compute" cc-113-1.local:8649 cc-113-2.local:8649
data_source "MSU Server" msu2.local:8649
On msurxi have these data sources:
hum not sure how this will work on msurxi. Will have issue that we don't know a gmond to query --- nodes will come and go. Probably need to move the gmond on msurxi in the multicast for "MSU Test".
data_source "AGLT2 at MSU ROCKS 5 Test" localhost:8649
udp_send_channel
In ROCKS 4, the default gmond setup had:
/* UDP Channels for Send and Recv */
udp_recv_channel {
mcast_join = 225.31.5.18
port = 8649
mcast_if = eth0
}
udp_send_channel {
mcast_join = 225.31.5.18
port = 8649
mcast_if = eth0
}
In ROCKS 5 this is altered to:
/* UDP Channels for Send and Recv */
udp_recv_channel {
mcast_join = 224.0.0.4
port = 8649
}
udp_send_channel {
host = 10.10.128.12
port = 8649
}
Removing Spikes
Had an issue with Broadcom NIC driver giving impossibly large numbers from counters, this resulted in spikes in ganglia (rrdtool) graphs. Try googling "rrdtool remove spikes". There is a perl script on the rrdtool website that automates removing spikes from the rrdtool databases. See this blog
http://acktomic.com/?p=6
Anyways, tried this by hand doing
rrdtool dump
then editing the xml by hand and then doing rrdtool restore. Did it work???
Have removespikes.pl from
http://oss.oetiker.ch/rrdtool/pub/contrib/ Needed to modify to handle filenames with spaces. Using it using a threshold level instead of a percentage heuristic. Modified copy is in tools directory.
Alternate and More Permanent Solution
Another solution for the network rate spikes is to set the min and max parameters in the rrd database files (Ganglia doesn't set values for these when it creates the rrd files). Then you can dump and restore the data. The restore process respects the max values. This will also prevent spikes in the future.
Example of doing it
[root@msurxi ~]# rrdtool tune /var/lib/ganglia/rrds/AGLT2\ at\ MSU\ ROCKS\ 5\ Test/cc-117-1.msulocal/bytes_in.rrd --maximum sum:1000000000
[root@msurxi ~]# rrdtool info /var/lib/ganglia/rrds/AGLT2\ at\ MSU\ ROCKS\ 5\ Test/cc-117-1.msulocal/bytes_in.rrd
filename = "/var/lib/ganglia/rrds/AGLT2 at MSU ROCKS 5
Test/cc-117-1.msulocal/bytes_in.rrd"
rrd_version = "0003"
step = 15
last_update = 1259094985
ds[sum].type = "GAUGE"
ds[sum].minimal_heartbeat = 120
ds[sum].min = NaN
ds[sum].max = 1.0000000000e+09
ds[sum].last_ds = "13513.02"
ds[sum].value = 1.3513020000e+05
ds[sum].unknown_sec = 0
Seems fixed in ROCKS 5.2.2 - Nope
Have seen spikes in ROCKS 5.2.2, code mentioned below must not be enabled.
There is code in the gmond from ganglia 3.1.2 that rejects network counts above certain thresholds. This was put in to work around this issue. It seems that the gmond that is in ROCKS 5.2 service pack 5.2.2 has this code enabled. We shouldn't see these anymore.
Private Network
The options in gmond.conf don't seem to be effective in making gmond use eth0. Adding a route to the system for the multicast addresses on eth0 is effective.
# route
224.0.0.0 * 240.0.0.0 U 0 0 0 eth0
Setting for bonds
Nodes dropping in MSUROX
Have continuous problem with nodes not being included in collection.
Have switched to using unicast for metric sending, seems to resolve issue.
Note that the prblem may be iptables/igmp/kernel version related, have not investigated this though.
--
TomRockwell - 01 May 2008