Nagios Monitoring
At INEX we use Nagios to monitor a number of production services including:
- peering LAN switches;
- member reachability (ping v4/v6);
- member route collector sessions;
- member route server sessions.
IXP Manager can generate configuration to monitor the above for you.
NB: IXP Manager will not install and configure Nagios from scratch. You need a working Nagios installation first and then IXP Manager will automate the above areas of the configuration.
Sample Scripts
You will find sample scripts for pulling Nagios configuration from IXP Manager and reloading Nagios at:
https://github.com/inex/IXP-Manager/tree/master/tools/runtime/nagios
Monitoring Member Reachability
We monitor all member router interfaces (unless asked not to) via ICMP[v6] pings with Nagios. This is all controlled by the Nagios configuration created with this feature.
To enable / disable these checks, edit the VLAN interface configuration and set IPvX Can Ping appropriately. Note that when IPvX Can Ping is disabled, the host definition is created anyway as this would be used for other Nagios checks such as route collector sessions.
There is an additional option when editing a member's VLAN interface called Busy Host. This changes the Nagios ping fidelity from 250.0,20%!500.0,60%
to 1000.0,80%!2000.0,90%
(using the default object definitions which are configurable). This is useful for routers with slow / rate limited control planes.
Members are added to a number of hostgroups also:
- a per-switch hostgroup;
- a per cabinet hostgroup;
- a per location / data centre hostgroup;
- an all members hostgroup.
These hostgroups are very useful to single out issues and for post-maintenance checks.
You can use the IXP Manager API to get the Nagios configuration for a given VLAN and protocol using the following endpoint format (both GET and POST requests work):
https://ixp.example.com/api/v4/nagios/customers/{vlanid}/{protocol}
where:
vlanid
is the database ID (DB ID) of the VLAN. You can find the DB ID in IXP Manager in the VLAN table (select VLANs from the left hand side menu).protocol
is either4
forIPv4
or 6 forIPv6
.
If either of these are invalid, the API will return with a HTTP 404 response.
And example of a target in the response is:
###############################################################################################
###
### Packet Clearing House DNS
###
### Equinix DB2 (Kilcarbery) / Packet Clearing House DNS / swi1-kcp1-1.
###
### Host: 185.6.36.60 / inex.woodynet.net / Peering VLAN #1.
define host {
use ixp-manager-member-host
host_name packet-clearing-house-dns-as42-ipv4-vlanid2-vliid109
alias Packet Clearing House DNS / swi1-kcp1-1 / Peering VLAN #1.
address 185.6.36.60
}
### Service: 185.6.36.60 / inex.woodynet.net / Peering VLAN #1.
define service {
use ixp-manager-member-ping-service
host_name packet-clearing-house-dns-as42-ipv4-vlanid2-vliid109
}
Configuring Nagios for Member Reachability
You will notice that the above configuration example is very light and is missing an awful lot of Nagios required configuration directives. This is intentional so that IXP Manager is not too prescriptive and allows you to define your own Nagios objects without having to resort to skinning IXP Manager.
Two of the most important elements of Nagios configuration which you need to understand are object definitions and object inheritance.
You can pass three optional parameters to Nagios via GET/POST and these are:
host_definition
; defaults to:ixp-manager-member-host
.service_definition
; defaults toixp-manager-member-service
.ping_service_definition
; defaults to:ixp-manager-member-ping-service
.ping_busy_service_definition
; defaults to:ixp-manager-member-ping-busy-service
.
An example of changing two of these parameters is:
curl --data "host_definition=my-host-def&service_definition=my-service-def" -X POST \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "X-IXP-Manager-API-Key: my-ixp-manager-api-key" \
https://ixpexample.com/api/v4/nagios/customers/2/4
An example of the three objects that INEX use for this are:
define host {
name ixp-manager-member-host
check_command check-host-alive
check_period 24x7
max_check_attempts 10
notification_interval 120
notification_period 24x7
notification_options d,u,r
contact_groups admins
register 0
}
define service {
name ixp-manager-member-service
check_period 24x7
max_check_attempts 10
check_interval 5
retry_check_interval 1
contact_groups admins
notification_interval 120
notification_period 24x7
notification_options w,u,c,r
register 0
}
define service {
name ixp-manager-member-ping-service
use ixp-manager-member-service
service_description PING
check_command check_ping!250.0,20%!500.0,60%
register 0
}
define service {
name ixp-manager-member-ping-busy-service
use ixp-manager-member-service
service_description PING-Busy
check_command check_ping!1000.0,80%!2000.0,90%
register 0
}
Templates / Skinning
You can use skinning to make changes to the bundled default
template or, preferably, add your own.
Let's say you wanted to add your own template called mytemplate1
and your skin is named myskin
. The best way to proceed is to copy the bundled example:
cd $IXPROOT
mkdir -p resources/skins/myskin/api/v4/nagios/customers
cp resources/views/api/v4/nagios/customers/default.foil.php resources/skins/myskin/api/v4/nagios/customers/mytemplate1.foil.php
You can now edit this template as required. The only constraint on the template name is it can only contain characters from the classes a-z, 0-9, -
. NB: do not use uppercase characters.
You can then elect to use this template by tacking the name onto the API request:
https://ixp.example.com/api/v4/nagios/customers/{vlanid}/{protocol}/{template}
where, in this example, {template}
would be: mytemplate1
.
As a policy, INEX tends to use the bundled templates and as such they should be fit for general purpose.
Switch Monitoring
We monitor all production peering LAN switches for a number of difference services (see below).
IXP Manager produces a host configuration for each production switch such as:
#
# swi2-dc1-1 - DUB01.XX.YY.ZZ, Data Centre DUB1.
#
define host {
use ixp-manager-production-switch
host_name swi2-dc1-1.mgmt.inex.ie
alias swi2-dc1-1
address 192.0.2.4
_DBID 74
}
Members are added to a number of hostgroups also:
- switches per location / data centre;
- all switches in the requested infrastructure;
- grouped by vendor name (the vendor's shortname as defined in IXP Manager);
- grouped by vendor model (as discovered by SNMP).
These hostgroups are very useful when defining service checks.
You can use the IXP Manager API to get the Nagios configuration for a given infrastructure using the following endpoint format (both GET and POST requests work):
https://ixp.example.com/api/v4/nagios/switches/{infraid}
where:
infraid
is the database ID (DB ID) of the infrastructure. You can find the DB ID in IXP Manager in the infrastructures table (select Infrastructures from the left hand side menu).
You can use skinning to make changes to the bundled default
template or, preferably, add your own.
Let's say you wanted to add your own template called myswtemplate1
and your skin is named myskin
. The best way to proceed is to copy the bundled example:
cd $IXPROOT
mkdir -p resources/skins/myskin/api/v4/nagios/switches
cp resources/views/api/v4/nagios/switches/default.foil.php resources/skins/myskin/api/v4/nagios/switches/myswtemplate1.foil.php
You can then elect to use this template by tacking the name onto the API request:
https://ixp.example.com/api/v4/nagios/switches/{infraid}/{template}
where, in this example, {template}
would be: myswtemplate1
.
You can pass one optional parameter to Nagios via GET/POST which is the host definition to inherit from (see customer reachability testing about for full details and examples):
curl --data "host_definition=my-sw-host-def" -X POST \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "X-IXP-Manager-API-Key: my-ixp-manager-api-key" \
https://ixpexample.com/api/v4/nagios/switches/2
Service Checking
The recommended way to check various services on your production switches is to use the host groups created by the above switch API call. An example of the hostgroups produced include:
ixp-production-switches-infraid-2
: all switches on an infrastructure with DB ID 2;ixp-switches-infraid-2-dc-dub1
: all switches in location dc-dub1;ixp-switches-infraid-2-extreme
: all Extreme switches on an infrastructure with DB ID 2; andixp-switches-infraid-2-extreme-x670g2-48x-4q
: all Extreme switches of model X670G2-48x-4q on an infrastructure with DB ID 2.
Using these, you can create generic service definitions to apply to all hosts such as:
define service{
use my-ixp-production-switch-service
hostgroup_name ixp-production-switches-infraid-1, ixp-production-switches-infraid-2
service_description ping - IPv4
check_command check_ping_ipv4!10!100.0,10%!200.0,20%
}
define service {
use my-ixp-production-switch-service
hostgroup_name ixp-production-switches-infraid-1, ixp-production-switches-infraid-2
service_description SSH
check_command check_ssh
}
You can target vendor / model specific checks as appropriate:
define service{
use my-ixp-production-switch-service
hostgroup_name ixp-switches-infraid-1-extreme, ixp-switches-infraid-2-extreme
service_description Chassis
check_command check_extreme_chassis
}
The one thing you'll need to keep an eye on is adding hostgroups to service checks as you create new infrastructures / add new switch vendors / models.
Hint: over the years, we at INEX have written a number of switch chassis check scripts and these can be found on Github at barryo/nagios-plugins.
For example the Extreme version checks and returns something like:
OK - CPU: 5sec - 10%. Uptime: 62.8 days. PSUs: 1 - presentOK: 2 - presentOK:. Overall system power state: redundant power available. Fans: [101 - OK (4311 RPM)]: [102 - OK (9273 RPM)]: [103 - OK (4468 RPM)]: [104 - OK (9637 RPM)]: [105 - OK (4165 RPM)]: [106 - OK (9273 RPM)]:. Temp: 34'C. Memory (slot:usage%): 1:29%.
Birdseye Daemon Monitoring
We monitor our Bird instances at INEX directly through Birdseye, the software we use for our looking glass. This means it is currently tightly coupled to Bird and Birdseye until such time as we look at a second router software.
IXP Manager produces a host and service configuration for each router such as:
define host {
use ixp-manager-host-birdseye-daemon
host_name bird-rc1q-cork-ipv4
alias INEX Cork - Quarantine Route Collector - IPv4
address 10.40.5.134
_api_url http://rc1q-ipv4.cork.inex.ie/api
}
define service {
use ixp-manager-service-birdseye-daemon
host_name bird-rc1q-cork-ipv4
}
You can use the IXP Manager API to get the Nagios configuration for all or a given VLAN using the following endpoint format (both GET and POST requests work):
https://ixp.example.com/api/v4/nagios/birdseye-daemons
https://ixp.example.com/api/v4/nagios/birdseye-daemons/{template}
https://ixp.example.com/api/v4/nagios/birdseye-daemons/default/{vlanid}
https://ixp.example.com/api/v4/nagios/birdseye-daemons/{template}/{vlanid}
where:
{template}
is the optional skin (see below).{vlanid}
is the VLAN id to limit the results to. If setting this, you need to provide a template also (ordefault
for the standard template).
You can use skinning to make changes to the bundled default
template or, preferably, add your own.
Let's say you wanted to add your own template called mybetemplate1
and your skin is named myskin
. The best way to proceed is to copy the bundled example:
cd $IXPROOT
mkdir -p resources/skins/myskin/api/v4/nagios/birdseye-daemons
cp resources/views/api/v4/nagios/birdseye-daemons/default.foil.php resources/skins/myskin/api/v4/nagios/birdseye-daemons/mybetemplate1.foil.php
You can then elect to use this template by tacking the name onto the API request:
https://ixp.example.com/api/v4/nagios/birdseye-daemons/{template}
where, in this example, {template}
would be: mybetemplate1
.
You can pass two optional parameter to Nagios via GET/POST which is the host and service definition to inherit from (see customer reachability testing about for full details and examples):
curl --data "host_definition=my-be-host-def&service_definition=my-be-srv-def" -X POST \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "X-IXP-Manager-API-Key: my-ixp-manager-api-key" \
https://ixpexample.com/api/v4/nagios/birdseye-daemons
The default values for the host and service definitions are ixp-manager-host-birdseye-daemon
and ixp-manager-service-birdseye-daemon
respectively.
Service Checking
You will need to create a parent host and service definition for the generated configuration such as:
define host {
name ixp-manager-host-birdseye-daemon
check_command check-host-alive
check_period 24x7
max_check_attempts 10
notification_interval 120
notification_period 24x7
notification_options d,u,r
contact_groups admins
register 0
}
define service {
name ixp-manager-service-birdseye-daemon
service_description Bird BGP Service
check_command check_birdseye_daemon!$_HOSTAPIURL$
check_period 24x7
max_check_attempts 10
check_interval 5
retry_check_interval 1
contact_groups admins
notification_interval 120
notification_period 24x7
notification_options w,u,c,r
register 0
}
define command{
command_name check_birdseye_daemon
command_line /usr/local/nagios-plugins-other/nagios-check-birdseye.php -a $ARG1$
}
The Nagios script we use is bundled with inex/birdseye and can be found here.
Typical Nagios state output:
OK: Bird 1.6.2. Bird's Eye 1.0.4. Router ID 192.0.2.126. Uptime: 235 days. Last Reconfigure: 2017-07-17 16:00:04.26 BGP sessions up of 28.
Birdseye BGP Session Monitoring
We monitor our Bird route collector, route server and AS112 Bird BGP sessions at INEX directly through Birdseye, the software we use for our looking glass. This means it is currently tightly coupled to Bird and Birdseye until such time as we look at a second router software.
IXP Manager produces a host and service configuration for each router type such as:
### Router: INEX LAN1 - Route Collector - IPv4 / 192.0.2.126.
define service {
use ixp-manager-member-bgp-session-service
host_name as112-reverse-dns-as112-ipv4-vlanid2-vliid99
service_description BGP session to rc1-lan1-ipv4 (INEX LAN1 - Route Collector - IPv4)
_api_url http://www.example.com/api
_protocol pb_0099_as112
}
The configuration also includes hostgroups for the given VLAN, protocol and type for:
- per-router;
- all sessions.
You can use the IXP Manager API to get the Nagios configuration for a given protocol, VLAN and router type using the following templates:
https://ixp.example.com/api/v4/nagios/birdseye-bgp-sessions/{vlanid}/{protocol}/{type}
https://ixp.example.com/api/v4/nagios/birdseye-bgp-sessions/{vlanid}/{protocol}/{type}/{template}
where:
{vlanid}
is the VLAN id to generate the configuration for.{protocol}
is either 4 (ipv4) or 6 (ipv6).{type}
is one of (these are defined in Entities\Router):1
for route servers;2
for route collectors;3
for AS112
You can use skinning to make changes to the bundled default
template or, preferably, add your own.
Let's say you wanted to add your own template called myrstemplate1
and your skin is named myskin
. The best way to proceed is to copy the bundled example:
cd $IXPROOT
mkdir -p resources/skins/myskin/api/v4/nagios/birdseye-bgp-sessions
cp resources/views/api/v4/nagios/birdseye-bgp-sessions/default.foil.php resources/skins/myskin/api/v4/nagios/birdseye-bgp-sessions/myrstemplate1.foil.php
You can then elect to use this template by tacking the name onto the API request:
https://ixp.example.com/api/v4/nagios/birdseye-bgp-sessions/{vlanid}/{protocol}/{type}/{template}
where, in this example, {template}
would be: myrstemplate1
.
You can pass one optional parameter to Nagios via GET/POST which is the service definition to inherit from (see customer reachability testing about for full details and examples):
curl --data "service_definition=my-rs-srv-def" -X POST \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "X-IXP-Manager-API-Key: my-ixp-manager-api-key" \
https://ixpexample.com/api/v4/nagios/birdseye-bgpsessions/2/4/1
The default values for the service definition is ixp-manager-member-bgp-session-service
respectively.
Service Checking
You will need to create a parent service definition and a check command for the generated configuration such as:
define service {
name ixp-manager-member-bgp-session-service
service_description Member Bird BGP Sessions
check_period 24x7
max_check_attempts 10
check_interval 5
retry_check_interval 1
contact_groups admins
notification_interval 120
notification_period 24x7
notification_options w,u,c,r
register 0
check_command check_birdseye_bgp_session!$_SERVICEAPI_URL!$_SERVICEPROTOCOL
}
define command{
command_name check_birdseye_bgp_session
command_line /path/to/nagios-check-birdseye-bgp-sessions.php -a $ARG1$ -p $ARG2$ -n
}
The Nagios script we use is bundled with inex/birdseye and can be found here.