Archive for January 2006
I installed Gregarius on mgmt1. It provides a river of news view of all our worklogs. It is an easy way to see which worklogs have been updated, similar to the default view in projectweb.
It is not multiuser, but that doesn’t really matter, as it can display read and unread posts.
The first multi-agent discovery was completed succesfully on the Cus X machines in production today. In addition to the normal SSM agent stuff, graphs on the Cus X internal agent (running on port 2024) were discovered as well. The Cus X machines were configured with two profiles, server and cusx. Each profile has associated classes, where each class knows by which agent and on which port it should be discovered.
In another way it was a first too. The Cus X agent stores it’s instances as strings, which translate to SNMP by using the ASCII values of the individual characters, interspersed with dots, and preceded by the string length. For example, “DBC” gets to be “126.96.36.199″. This is then tacked onto the generic part of the OID. This kind of SNMP instance resembles the UID based mapping that Weblogic uses. For those that have a soft spot for this kind of wackiness, Weblogic replaces the string length number with a fixed space (32).
Currently from Measurements.conf, @Polls, @SnmpParams, and @SampleCollections are filled straight, since the instances are known. This will be impossible in the new poller, since instances need to be discovered. Instead of a single list, there need to be per agent lists of polls, since an agent (ip + port) is the unit of discovery
From the list of configured servers + profiles, the classes for each server can be deduced, and from the ports on the classes the agents can be as well. The list of agents can then be initialized.
Each agent then has a list of polls, consisting of all the measurements that can be done simultaneously, similar to how this is done globally for the current poller.
Since settings that were held in each @Poll record are common to all polls in an agent, they are extracted from the @Poll record and put in the agent. Only pollmodus (frequency) will need to be stored inside the poll. Ip, community, and version go to the agent.
Scheduling code that was scattered in various places in the current poller is factored into a Scheduler class. This removes the necessity to keep a “PollConst” as this kind of housekeeping is done by the scheduler.
The settings file for servers that links them with profiles is a YAML file. It’s format is identical to the one used for the Provisioner application that discovers instances and generates config files currently.
The SNMP poller is stable and works OK, so why do we need to change it?
The answer is that it is tedious to configure and difficult to make configuration changes. This is not so much because it is difficult to do in the poller. It is more that there are other components that have to be configured as well, and in sync with the poller. The metadata in the cm database has components, measurement types and datastreams that have to match the corresponding entities in the poller. In addition, RRD files need to be created and maintained for historical performance graphs. We are also having problems with servers that are reprovisioned, or replaced with new hardware. There are no tools for deletion or change so all that stuff now has to happen by hand. Clearly that is not a workable solution in the future.
So, what will be done:
- Currently all measurements are configured one by one in the SNMP Poller’s Measurements.conf file. The poller will merely be configured with IP addresses and profiles in the future, and through profile setup and discovery figure out by itself what to poll. This paves the way to a fully automatic setup of performance graphs.
- The poller will detect changes, by rediscovering upon an agent restart, in addition to time-based, and comparing the results with the previous values. Changes will be reported back to datacollectors, who can act upon it with respect to RRD files and metadata
- The poller will handle configuration changes more gracefully. Instead of always reloading the entire configuration, it will try to just handle the changes.
- The poller will be able to support polling for other purposes than historical data collection, by providing a means of making subsets of the polled data separately.
- Separate instances of the polling engine will be callable through a web interface, in order to obtain live polls for small subsets of the measurements. This information can be used in the portals to show latest values, or even live graphs.
[Update] OK, that description was too careful. Our goal is no less than make configuration and maintenance easy and mostly automatic. Having a stable, but difficult to configure Poller is good, but not good enough.
I will write up the redesign of the poller on the worklog, for future reference. To understand it, it is important to have an idea of how the SNMP poller version 1 works. Here is a short description.
The existing SNMP Poller keeps 3 main collections:
Each poll describes a separate timed request to an agent (Ip address + optional port), that comprises 1 or more measurements (SNMP gets) that are combined. SNMP gets are only combined if they have the same frequency (poll modus). Each poll record contains a list of indexes into tables 2 and 3 (below), plus various parameters it needs to do make the SNMP request, such as ip, community, version, poll modus (freq) and poll constant (to spread the polls), etc). A separate hash index is kept from IP + poll modus to the poll record, to easily find the poll record from the address + modus
Stores address, address space, class, instance and measurementtype, plus all collected samples for each individual measurement
Stores things needed for each individual SNMP get: OID, SNMP Instance (iid), eval expression.
Collections 2 and 3 are in sync. The index of each is stored in the each record in the @Polls collection as well.
In addition to these collections, a couple of hashes exist to give access to configuration data:
- %oidMap. Contains the oid for each class, measurement type, mib combination.
- %evalMap. Contains the evaluation expression using the same key as %oidMap.
- %snmpVersionMap. Contains SNMP version and the maximum number of combined polls by IP address.
- %communityMap. Contains the SNMP Community string by IP address
Instance mapping has two levels. First, maps exist that find the configuration parameters based on the OID. These maps are called ConfigurationMaps. Second, maps exist that map from the Smart24, human readable, instance value to the dotted numerical SNMP instance value. These maps are called InstanceMaps. The InstanceMaps are read directly from the servers (”dynamic”, “index” and “ipaddress” instance mapping) or from a static file (”static”) instance mapping. The configuration is kept around, because instance maps are refreshed every once in a while.
The poll cycle
Net-SNMP calls back to the SNMP Poller’s poll method every 2 seconds. The poll method then checks whether it has do certain actions based on whether signals were received and based on a poll counter it maintains. It checks for the following
- Was a KILL signal received? If yes write the collected data to disk and quit
- Was a HUP signal received? If yes write the collected data to disk, and reread configuration
- Is it time to do the regular write to disk? if yes, write out the data
- Is it time to create a polling report? if yes, write the stats to the log file
- Is it time to do an instance refresh for certain IPs? if yes do so
- Check which polls are up, and carry out those.
I got this fromthe server team a while ago, to give a new permanent IP address to a windows server that was provisioned by Opsware.
Please execute the following commands from command prompt
after IP(dhcp -> fixed) & hostname changing.
It will be re-registration OPSW-agent to Opsware core server.
1. C:\Program Files\Loudcloud\blackshadow\cog\bs_hardware.bat
2. C:\Program Files\Loudcloud\blackshadow\cog\bs_software.bat
To add DNS suffixes in windows go to properties on the network adapter, openup TCP/IP settings / advanced / DNS.
Use reasonable defaults wherever possible to reduce the configuration complexity.
– SNMP Poller: Use the mibname as the measurementtype name by default
Where configuration is necessary, do it in one place and at the broadest possible entity.
– SNMP Poller: Configure SNMP version mappings that are other than default for device types, not for individual devices
– SNMP Poller: Configure pollgroups and collgroups for address ranges, not for individual devices