This module provides a full nagios monitoring solution, for setting up both servers and clients. There can be any number of each, though there is typically one server and many clients. The main limitation is that all server and client nodes need to have all of their nagios related puppet configuration on the same puppetmaster.
The server part is still very Red Hat Enterprise Linux specific. The client part is much more generic and has been tested on both RHEL and Gentoo.
Please note that this module is not for the faint of heart. Even I (the author) have my head hurt each time I have to make modificiations to it... but it is worth it, as it allows having monitoring automatically enabled by default on all nodes as well as for all standard services detected on them.
- Stored configurations enabled on the puppetmaster (mandatory)
apache_httpd
andphp
modules used for the nagios server node (optional)
The apache_httpd
and php
modules are required for the server part, though
optionally since it is also possible to use an existing non-puppet managed web
server or different puppet modules. For a new empty node, configuring it as a
nagios server will be much quicker if those modules can be used.
Stored configurations are essential on the puppetmaster for the module to work at all, since it relies on having all nodes create their own exported nagios host and service resources, which the nagios server node then realizes to build its configuration.
For RHEL, any packages which might be required but are not part of the official repositories or EPEL can be found on http://dl.marmotte.net/rpms/
When SELinux is enforcing, the selinux::audit2allow
definition is required
to allow some basic nagios/nrpe accesses, though it can also be disabled.
Nagios server instance (node specific, typically inside a node section) :
class { '::nagios::server':
apache_httpd_ssl => false,
apache_httpd_conf_content => template('my/httpd-nagios.conf.erb'),
apache_httpd_htpasswd_source => 'puppet:///modules/my/htpasswd',
cgi_authorized_for_system_information => '*',
cgi_authorized_for_configuration_information => '*',
cgi_authorized_for_system_commands => '*',
cgi_authorized_for_all_services => '*',
cgi_authorized_for_all_hosts => '*',
cgi_authorized_for_all_service_commands => '*',
cgi_authorized_for_all_host_commands => '*',
cgi_default_statusmap_layout => '3',
}
Nagios client instances (typically from site.pp
) :
class { '::nagios::client':
nrpe_allowed_hosts => '127.0.0.1,192.168.1.1',
}
Nagios client specific overrides. See client.pp
and check/*.pp
for all of
the variables which can be manipulated this way. The following :
nagios::client::config { 'host_address': value => $::ipaddress_eth2 }
Will result in having $::nagios_host_name
get $ipaddress_eth2
as its value
for the entire configuration of the client where it is applied.
Nagios client check override configuration examples :
nagios::client::config { 'check_ram_ensure': value => 'absent' }
nagios::client::config { 'check_cpu_args': value => '-w 50 -c 20' }
Configuring a default check (must be done from a scope where nagios::client
can inherit it) :
Nagios::Check::Swap { ensure => 'absent' }
if $::domain == 'example.com' {
Nagios::Check::Cpu { notification_period => 'workhours' }
}
To enable nagiosgraph on the server :
class { '::nagios::nagiosgraph':
# This is the default
perflog => '/var/log/nagios/service_perfdata.log',
# To enable the mouseover graphs
nagios_service_action_url => '/nagiosgraph/cgi-bin/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$\' onMouseOver=\'showGraphPopup(this)\' onMouseOut=\'hideGraphPopup()\' rel=\'/nagiosgraph/cgi-bin/showgraph.cgi?host=$HOSTNAME$&service=$SERVICEDESC$',
ssi => true,
}
# This is what needs to be changed/added for nagios::server
class { '::nagios::server':
process_performance_data => '1',
service_perfdata_file => '/var/log/nagios/service_perfdata.log',
service_perfdata_file_template => '$LASTSERVICECHECK$||$HOSTNAME$||$SERVICEDESC$||$SERVICEOUTPUT$||$SERVICEPERFDATA$',
service_perfdata_file_processing_interval => '30',
service_perfdata_file_processing_command => 'process-service-perfdata-nagiosgraph',
}
To enable nagiosgraph for the client's services in the server web interface :
class { '::nagios::client':
service_use => 'generic-service,nagiosgraph-service',
}
To override the parameters of a default template using hiera :
```yaml
---
# Remove default warning notifications for services
nagios::server::template_generic_service:
notification_options: 'u,c,r'
Debug any startup or configuration problems on the server with :
nagios -v /etc/nagios/nagios.cfg
A lot can be configured semi-dynamically for nagios::client
(ideally using
hiera's automatic class parameter lookup) :
class { '::nagios::client':
host_notification_period => $::domain ? {
/\.dev$/ => 'workhours',
default => '24x7',
}
# You will need to use the type "nagios_hostgroup" on the server for
# all of the possible domain values to create the hostgroups.
host_hostgroups => $::domain,
}
Client overrides should be done in this order :
- For all services on some nodes : Using
nagios::client service_*
parameters - For all services on a node : Using the
nagios::client host_*
parameters - For specific services on all nodes : Using
Nagios::Check::Foo { paramname => 'value' }
(the scope must be the same or lower than wherenagios::client
is called from) - For specific services on a node : Using
nagios::client::config { 'check_*': value => 'foo' }
overrides
Note : nagios::client::config
can also be used to override just about
anything, though you must take into account that since it's done using custom
facts, you'll need two puppet runs on the client node then one on the server
node for the change to be applied : One for the fact to be created, then
another for the client configuration to take the new fact into account, then
the server run to update the nagios configuration. This might take a little
while depending on how often puppet is run on the nodes.
The mysql_health part is more recent than the rest, and :
- Relies on using automatic hiera class parameter lookups.
- Requires the puppetlabs-stdlib module because it uses getvar().
You will need to create the MySQL user on your servers, allowed for localhost since we use nrpe for execution. Example :
# This could go in site.pp, the fact is true only if mysqld is found
if $::nagios_mysqld {
mysql_user { 'nagios@localhost':
ensure => 'present',
password_hash => mysql_password('mysupersecretpassword'),
}
mysql_grant { 'nagios@localhost/*.*':
user => 'nagios@localhost',
table => '*.*',
privileges => [ 'REPLICATION CLIENT' ],
require => Mysql_user['nagios@localhost'],
}
}
# In hieradata
nagios::check::mysql_health::args: '--username nagios --password mysupersecretpassword'
The single mysql_health script has many different 'modes', which are all enabled by default. Because hyphens shouldn't be used in puppet variable names, we use underscores instead in their names.
You can either selectively disable some :
# Disable some checks (modes)
nagios::check::mysql_health::modes_disabled:
- 'slave_io_running'
- 'slave_lag'
- 'slave_sql_running'
Or selectively enable some :
# Enable only the following checks (modes)
nagios::check::mysql_health::modes_enabled:
- 'connection_time'
- 'open_files'
- 'uptime'
Then for each mode, you can also pass some arguments, typically to change the warning and critical values as needed :
# Tweak some check values
nagios::check::mysql_health::args_connection_time: '--warning 5 --critical 10'
If you decommission a Nagios-monitored host a couple of manual steps are required to clean up.
# On the Puppet Master
puppet node deactivate <my_host>
# On the Nagios server
puppet agent -t
service nagios restart
See Issue #21 on why the service restart is required.