The Secret Diary of Han, Aged 0x29

twitter.com/h4n

Archive for January 2006

Server based RSS aggregator (Gregarius) installed on 10.0.16.50

I installed Gregarius on mgmt1. It provides a river of news view of all our worklogs. It is an easy way to see which worklogs have been updated, similar to the default view in  projectweb.

It is not multiuser, but that doesn’t really matter, as it can display read and unread posts.

Written by Han

January 30, 2006 at 19:16

Posted in General

First multi-agent discovery in production

The first multi-agent discovery was completed succesfully on the Cus X machines in production today. In addition to the normal SSM agent stuff, graphs on the Cus X internal agent (running on port 2024) were discovered as well. The Cus X machines were configured with two profiles, server and cusx. Each profile has associated classes, where each class knows by which agent and on which port it should be discovered.

In another way it was a first too. The Cus X agent stores it’s instances as strings, which translate to SNMP by using the ASCII values of the individual characters, interspersed with dots, and preceded by the string length. For example, “DBC” gets to be “3.68.66.67″. This is then tacked onto the generic part of the OID. This kind of SNMP instance resembles the UID based mapping that Weblogic uses. For those that have a soft spot for this kind of wackiness, Weblogic replaces the string length number with a fixed space (32).

Written by Han

January 30, 2006 at 16:12

Posted in Tools

SNMP Poller changes, Phase 1. configuration using server profiles

Currently from Measurements.conf, @Polls, @SnmpParams, and @SampleCollections are filled straight, since the instances are known. This will be impossible in the new poller, since instances need to be discovered. Instead of a single list, there need to be per agent lists of polls, since an agent (ip + port) is the unit of discovery

From the list of configured servers + profiles, the classes for each server can be deduced, and from the ports on the classes the agents can be as well. The list of agents can then be initialized.
Each agent then has a list of polls, consisting of all the measurements that can be done simultaneously, similar to how this is done globally for the current poller.

Since settings that were held in each @Poll record are common to all polls in an agent, they are extracted from the @Poll record and put in the agent. Only pollmodus (frequency) will need to be stored inside the poll. Ip, community, and version go to the agent.

Scheduling code that was scattered in various places in the current poller is factored into a Scheduler class. This removes the necessity to keep a “PollConst” as this kind of housekeeping is done by the scheduler.

The settings file for servers that links them with profiles is a YAML file. It’s format is identical to the one used for the Provisioner application that discovers instances and generates config files currently.

Written by Han

January 27, 2006 at 16:20

Posted in Tools

What will change for the SNMP poller

The SNMP poller is stable and works OK, so why do we need to change it?

The answer is that it is tedious to configure and difficult to make configuration changes. This is not so much because it is difficult to do in the poller. It is more that there are other components that have to be configured as well, and in sync with the poller. The metadata in the  cm database has components, measurement types and datastreams that have to match the corresponding entities in the poller. In addition, RRD files need to be created and maintained for historical performance graphs. We are also having problems with servers that are reprovisioned, or replaced with new hardware. There are no tools for deletion or change so all that stuff now has to happen by hand. Clearly that is not a workable solution in the future.

So, what will be done:

  1. Currently all measurements are configured one by one in the SNMP Poller’s Measurements.conf file. The poller will merely be configured with IP addresses and profiles in the future, and through profile setup and discovery figure out by itself what to poll. This paves the way to a fully automatic setup of performance graphs.
  2. The poller will detect changes, by rediscovering upon an agent restart, in addition to time-based, and comparing the results with the previous values. Changes will be reported back to datacollectors, who can act upon it with respect to RRD files and metadata
  3. The poller will handle configuration changes more gracefully. Instead of always reloading the entire configuration, it will try to just handle the changes.
  4. The poller will be able to support polling for other purposes than historical data collection, by providing a means of making subsets of the polled data separately.
  5. Separate instances of the polling engine will be callable through a web interface, in order to obtain live polls for small subsets of the measurements. This information can be used in the portals to show latest values, or even live graphs.

[Update] OK, that description was too careful. Our goal is no less than make configuration and maintenance easy and mostly automatic. Having a stable, but difficult to configure Poller is good, but not good enough.

Written by Han

January 27, 2006 at 14:33

Posted in Tools

Existing SNMP poller structure

I will write up the redesign of the poller on the worklog, for future reference. To understand it, it is important to have an idea of how the SNMP poller version 1 works. Here is a short description.

Main structures

The existing SNMP Poller keeps 3 main collections:

1. @Polls.
Each poll describes a separate timed request to an agent (Ip address + optional port), that comprises 1 or more measurements (SNMP gets) that are combined. SNMP gets are only combined if they have the same frequency (poll modus). Each poll record contains a list of indexes into tables 2 and 3 (below), plus various parameters it needs to do make the SNMP request, such as ip, community, version, poll modus (freq) and poll constant (to spread the polls), etc). A separate hash index is kept from IP + poll modus to the poll record, to easily find the poll record from the address + modus

2. @SampleCollections
Stores address, address space, class, instance and measurementtype, plus all collected samples for each individual measurement

3. @SNMPParams
Stores things needed for each individual SNMP get: OID, SNMP Instance (iid), eval expression.

Collections 2 and 3 are in sync. The index of each is stored in the each record in the @Polls collection as well.

Configuration maps

In addition to these collections, a couple of hashes exist to give access to configuration data:

  1. %oidMap. Contains the oid for each class, measurement type, mib combination.
  2. %evalMap. Contains the evaluation expression using the same key as %oidMap.
  3. %snmpVersionMap. Contains SNMP version and the maximum number of combined polls by IP address.
  4. %communityMap. Contains the SNMP Community string by IP address

Instance mapping

Instance mapping has two levels. First, maps exist that find the configuration parameters based on the OID. These maps are called ConfigurationMaps. Second, maps exist that map from the Smart24, human readable, instance value to the dotted numerical SNMP instance value. These maps are called InstanceMaps. The InstanceMaps are read directly from the servers (”dynamic”, “index” and “ipaddress” instance mapping) or from a static file (”static”) instance mapping. The configuration is kept around, because instance maps are refreshed every once in a while.

The poll cycle

Net-SNMP calls back to the SNMP Poller’s poll method every 2 seconds. The poll method then checks whether it has do certain actions based on whether signals were received and based on a poll counter it maintains. It checks for the following

  • Was a KILL signal received? If yes write the collected data to disk and quit
  • Was a HUP signal received? If yes write the collected data to disk, and reread configuration
  • Is it time to do the regular write to disk? if yes, write out the data
  • Is it time to create a polling report? if yes, write the stats to the log file
  • Is it time to do an instance refresh for certain IPs? if yes do so
  • Check which polls are up, and carry out those.

Written by Han

January 27, 2006 at 12:53

Posted in Tools

Procedure for reip-ing Windows servers that were provisioned by Opsware

I got this fromthe server team a while ago, to give a new permanent IP address to a windows server that was provisioned by Opsware.

Please execute the following commands from command prompt
after IP(dhcp -> fixed) & hostname changing.
It will be re-registration OPSW-agent to Opsware core server.

1. C:\Program Files\Loudcloud\blackshadow\cog\bs_hardware.bat

2. C:\Program Files\Loudcloud\blackshadow\cog\bs_software.bat

** Assumption **
It needs to resolve DNS names of OPSW core components. (SPIN,WAY,THEWORD)
And Win-Opsware agent uses short name to resolve DNS name.
/>>> Must add DNS suffix = dns.name.com

To add DNS suffixes in windows go to properties on the network adapter, openup TCP/IP settings / advanced / DNS.

Written by Han

January 27, 2006 at 11:49

Posted in Tools

To minimize configuration

Principle 1:
Use reasonable defaults wherever possible to reduce the configuration complexity.

Instances
– SNMP Poller: Use the mibname as the measurementtype name by default

Principle 2:
Where configuration is necessary, do it in one place and at the broadest possible entity.

Instances:
– SNMP Poller: Configure SNMP version mappings that are other than default for device types, not for individual devices
– SNMP Poller: Configure pollgroups and collgroups for address ranges, not for individual devices

Written by Han

January 25, 2006 at 17:53

Posted in General

Cycle 15 deployed and tagged

Cycle 15 deployment was completed last night.

I checked the cycle 15 code by doing a virgin checkout and build. Two projects don’t build, but both are SSM related and not deployed. Therefore the build was considered OK.

I tagged the cycle 15 branch with the following branch:

Cycle15-branch

Any modifications to cycle 15 code, bugfixes, etc, should go on this branch. Cycle 16 work goes on the head.

Written by Han

January 25, 2006 at 14:49

Posted in Tools

OID mapping web service

I whipped together a simple webservice that provides access to the mapping between OIDS and classes / measurement types. Try this:

getoids?classes=cpu

getoids?oids=.1.3.6.1.4.1.1977.9.2.

getoids

The URL has the following format

http://my.server/getoids

?classes=a,b,c

&measurementtypes=ee,ff,gg

&oids=h,i,j

All parameters are optional. If all are omitted, all OID records are returned. The service does a reverse partial match, so that searching for a complete OID will find generic OIDs that omit the instance part. In addition, OIDs can be either numeric or symbolic.

Written by Han

January 24, 2006 at 18:13

Posted in Tools

For an Alteon SSLVPN an ifIndex is not an ifIndex

The ifTable in Mib-2 is indexed by ifIndex. ifIndex itself is also first column of the ifTable, so you would expect the values of this column to always be the same as the index in the table. And up to know this has proven true for all devices discovered. Today, however, when I tried to discover the interfaces for an Alteon SSL-VPN device for a customer, this law of nature was broken:

bash-2.05# snmpwalk -v2c -cpublic 172.18.2.37 ifIndex
RFC1213-MIB::ifIndex.1 = INTEGER: 5
RFC1213-MIB::ifIndex.2 = INTEGER: 4
RFC1213-MIB::ifIndex.3 = INTEGER: 2
RFC1213-MIB::ifIndex.4 = INTEGER: 3

I had to change the discovery of interfaces from “dynamic” to “index” to cope with this. In “dynamic” it takes the value from the table column, whereas using index, it just uses the table index, which is what we want in this case.

Written by Han

January 24, 2006 at 15:49

Posted in General