Setting Up OMD on AGLT2 Systems
Monitoring for AGLT2 has used lots of different software: Ganglia, Syslog-ng, Cacti, Nagios, Shinken, Rancid, Monit/MMonit,
OpenManage and various home-made scripts and applications. Recently I became aware of Open Monitoring Distribution (OMD) which is an open-source project bundling a number of monitoring components (based upon Nagios) into a single, easy-to-install distribution. Details at
http://omdistro.org/start
Compnents are summarized (from a 'Check_MK' perspective) in the following diagram. Each box describes what the components primary functions are:
We have used Nagios for DYNES monitoring and have played with Shinken/PNP4Nagios as a possible replacement for Cacti. To try this out at AGLT2, I decided to repurpose the VM shinken-ha.aglt2.org and try an install
Installation Details
On
December 3, 2013 I renamed shinken-ha.aglt2.org to omd.aglt2.org. I requested the forward/reverse DNS changes via the Merit portal and uninstalled shinken from the VM.
To install
OMD I just needed to copy the current RPM from
http://files.omdistro.org/releases/centos_rhel/ omd-1.00-rh61-30.x86_64.rpm
I also did some work on the VM (updated tools, VM hardware and run 'yum update')
Then did '
yum install --nopgpgcheck omd-1.00-rh61-30.x86_64.rpm'
I found a problem with OMD on
CentOS6.4 documented here:
http://blog.christian-stankowic.de/?p=5312&lang=en After patching the binary things worked fine.
Configuration Details
OMD supports more than one site per installation. I wanted to setup a new site named 'aglt2' BUT the setup tries to create a new user and group with the sitename and the group aglt2 already existed. So I s
etup the site 'atlas' via 'omd create atlas'
The default login on
OMD is 'omdadmin' with password 'admin'. I set this to one of our admin pw ('S').
OMD can have different "core" processing element depending upon how you configure it (see 'check_mk' diagram below). This gives a pretty good overview of the components in OMD as well.
Reconfigured t
o use Shinken instead of
Nagios (or
Icinga)
: 'su - atlas; omd stop; omd config; (respond to prompts/menu options) omd start'
Lots of configuration information is at
http://mathias-kettner.com/checkmk.html
Installed the 'check_openmanage' plugin on omd.aglt2.org (see
http://folk.uio.no/trondham/software/check_openmanage.html )
Our site is protected against non-root use of cron so I neeeded to
add any new 'site' users to /etc/cron.allow. Since we created a site called 'atlas' we need to add 'atlas'to /etc/cron.allow. This is now in CFengine3.
Need to open two ports for remote access: 443 (https) and 57767 (shinken) (Also added to CFengine3)
Basic steps to configure starting as 'root':
- su - atlas (become the site user)
-
cd /omd/site/atlas
(this is the 'root' of the site)
- cd etc/check_mk (this is the location of the confguration for check_mk)
- Files for check_mk configuration end in .mk. See
/omd/site/atlas/etc/check_mk/cron.d
for examples.
The command to
inventory is '
check_mk -I <hostname>' but <hostname> needs to be already defined in check_mk config files.
The command to reload is '
check_mk -O' (after config changes).
Agent Installation on AGLT2 Systems
The OMD configuration benefits strongly from Check_MK. The Check_MK component has a set of rules and agents that can inventory and setup test-monitoring for a number of different systems and applications. To benefit from this we need to install the check_mk agent RPMS on our systems. As of December 4th we have the following two RPMS installed cluster-wide:
:
check_mk-agent-logwatch-1.2.2p3-1.noarch.rpm
check_mk-agent-1.2.2p3-1.noarch.rpm
We may need additional "local" agents installed to properly monitor various databases and other applications. See next section.
Issues for AGLT2 (Mis-configuration, False Problem Detection, Missing Functionality)
You can login to the AGLT2 'atlas' site at
http://omd.aglt2.org/atlas This is a front page which lets you select which Web interface you want to go to. There is also a Shinken page at
http://omd.aglt2.org:57767/
As we began to add hosts and tests we found a few issues that were NOT problems with the hosts or services at AGLT2. Some tests make wrong assumptions or have bad default checks that indicate problems when there are no real problems to address. We need to tune our site setup to fix this false-positives. Please see the
FixOMDFalsePositives page for the current list of issues, and, when known, their solutions.
--
ShawnMcKee - 09 Dec 2013
- Check_mk components and their functions: