Archive for May 2007
The general idea is that there are two kinds configuration tasks:
1. General day-to-day configuration, usually involving the addition, deletion and change of servers and network devices. This work makes up more than 90% of the configuration load.
2. Creation of new graph types, configuration of profiles, etc. This kind of configuration needs a deeper understanding of how the SNMP poller works and can be configured. It does not happen very often.
The day-to-day configuration (type 1) is to be made easy and convenient, and doable by the operation team without involvement from the tools team. All configurations in this category can be made simply by adding or removing profiles to and from servers and network devices. The monitor and graph provisioning tool provides a web based user interface to do just that.
It is important to note that, other than the addition and removal of profiles, no poller-related configuration can be done. So, what can be controlled by adding a profile to a server? It turns out, that is quite a lot:
A profile can control the following:
– Instances of which classes are to be polled for the device. For example, a profile “server” may indicate that the classes “system”, “cpu”, “disk”, “memory” and “interface” are to be polled. The discovery process will take this information and find out which cpu’s and disks are on the server (these are called instances).
– Which measurements are being made for a classes.
– At which port measurements are carried out
– The poll frequency for polls
– How many polls can be combined
– SNMP parameters, like community string and version
All in all, configuring servers with the right set of parameters for polling can be quite complex. By hiding this complexity behind easy to understand and reusable profiles, the day-to-day configuration work is made easy.
Ofcourse somebody still has to set up the profiles in the first place. This is work for a specialist that has in-depth knowledge of the SNMP poller. However, setting up profiles is not a day-to-day job. It only occurs rarely. Once profiles have been defined, they can be used and re-used by the operation team.
In order to stabilize and fool-proof the polling process, I separated the SNMP poller into two programs: Discovery and Polling. Discovery is a difficult and error prone task, but it can stand some downtime, whereas polling is simple but needs to be rock solid.
The idea is simple. The poller loads all settings for the SNMP agents it is polling from so called “instance files”. Each agent has its own instance file. Some other settings come from other configuration files, but everything concerning an an agent is only coming from that instance file. Most importantly, the instances, for example “CPU 1″ or “Disk /” are in this file. Additionally, which parameters to measure, and other settings are too. The only way to make the SNMP poller poll something else, is by changing the agent instance files.
The discovery process then, concerns itself with keeping the agent instance files up to date. It does this by monitoring the configuration files, and discovering instances on configured nodes.
There are multiple YAML configuration files. One, ‘nodes.conf’ is continuously monitored for changes. The others are read and analyzed when the discovery process receives a HUP signal. The ‘nodes.conf’ configuration file contains the nodes to polled and their profiles, which govern what is being polled on those nodes. This file is updated automatically when the corresponding date in the database changes.
Discovery happens in the following cases:
1. At regular intervals, as configured in ‘poller.conf’
2. When the sysUpTime OID of an agent changes. This is an indication that either the agent or the device was restarted. A restart means there is a higher than normal likelihood that something on the node changed, which means it may be a good time to kick of a rediscovery
3. When an agent is requested to be rediscovered by sending the discovery process a command
When information in the agent instance files changes, the discovery process will write a new YAML instances file for the agent, and inform the poller that the agent changed. It will also publish external notifications and XML and HTML instance files for use by other tools (Data Collector, etc).