Archive for April 2004
I had a hard time getting the SnmpPoller to work in the production environment. Exactly the same code that worked fine in the dev environment did not work on the production servers. It finally turned out to be due to the fact that IIS 6 is not compatible with a simple socket send of the HTTP message. IIS 5 was OK ofcourse, and I have no idea what caused this to malfunction on IIS 6. The symptom was that the socket was hard closed by IIS 6 after reception of about 1300 bytes. This caused the SnmpPoller to fail with “broken pipe” message. However on the dev machine, the SnmpPoller would not even detect the error and pretend everything was fine. This initially led me to believe that it worked from the dev machine pointing to the production datacollector. Pointing the production poller at the dev datacollector also worked, adding to the confusion.
This was finally solved by using libwww in Perl instead of a simple socket send or print. I’ll do a post-mortem later to find out what libwww does in a different way. Additionally, the RRD files would not be updated, even though the message would be delivered and handled correctly. This turned out to be due to a configuration error on the Datacollector machine. The Reporting site did not have the .net mappings defined. It still sort of worked since, for URL rewriting, the .net framework dll was defined as a wildcard application map, but that apparently led to sublte errors later on.
If necessary we can set up a telephone conference with D and R. But it might be good to discuss the issues here (on the worklog) first.
I had already tried that. It did not solve the problem.
The SNMP Poller on the production system did not work even after the right SNMP perl library was installed. Turned out that Net-SNMP was not configured correctly. The right mib files need to be present in /usr/local/share/snmp/mib. These, in our case, are the SSM mibs.
In addition, in /usr/local/share/snmp, snmp.conf needs to be configured correctly. All it needs is a line that says
Without it, it tries to load its default mibs which it cannot find anymore.
after this, the poller basically works, except it dies on a broken pipe when sending to the production DataCollector. The peculiar thing is that is works with a smaller Measurements file. In addition it also works when sending to the dev Datacollector. This must be due to a max message size on the Windows2003 IIS Datacollector.
Update. IIS6 apparantly has a default request limit of 200k. I tried changing it in the metabase, but that did not immediately work. If someone can figure out how to change that default, that would be appreciated.
I’ve been at the Micromuse conference in Berlin. They have some interesting stuff coming up. Here are some impressions
- Objectserver 7 (they dropped the “3.”).
- Will be released in 2 weeks
- supports triggers and signals in addition to the current temporal automations. The triggers allow you to intercept anything just before it hits the database. The signals allow you to react to “internal” objectserver events, for instance our beloved gateway disconnections.
- Automations now look like stored procedures, complete with if and foreach loops. The three generic clear automations (AA-AC) for instance, are recoded in a single very readable procedure
- Automations can be triggered on all tables, not just alerts.status. This enables a lot of things that you needed impact for in 3.x
- columns can be added on the fly (without a restart)
- supports 64 bit ints and reals. Would solve the time problem I suppose
- integrates with LDAP for user accounts
- new java based configuration gui. Also has a preview pane in the SQL builder mode and syntax highlighting for queries
- is multithreading, but the development lead told me that you should not expect wonders from that. All writes are still serialized and go through the same thread. readers can use multiple threads
- I have been assured it still works with JDBC sybase drivers
All in all a very interesting update. A good time for us to start testing this is probably after the June release.
- Impact 3.1 has DSA’s for webservices (two way) and for instant messaging (Jabber, and through jabber all other clients). You can thus send XML messages to impact. Could theoretically be used to send events into objectserver.
- SLAM is now called RAD, and the SLA part becomes optional. Turns out most people didn’t want that but really liked the pretty dashboard pictures. They now do dependencies in the way we do them, and allow importing them from external databases
- They offer a portal builder now (Netcool Portal), which is just an OEM-ed third party product. Makes for nice marketing brochure-style pictures, but I was not impressed.
- Precision 3.4 comes out soon, which is pretty much an incremental release. They added new RCA rules that handle redundant networks properly, like the one we have in decafe. The example I saw could have been taken straight out of our environment. I was not aware the current version could not do that. Since it is only RCA rules, you don’t need 3.4, but can be implemented in 3.3 too. Biggest change is that topoviz is now separate from webtop. Webtop is modularized so that other components can be added to it (not sure whether you need portal for that).
- they are building a new thick client (project “melody”) that will let you plug in UIs for all products. Still at least a year off
- “soon” they will release functionality in Precision that will let you autodiscover server parts and pieces through SSMs
There are tons of Ping failures in the event list, always to the same IP. Can someone (Michael) figure out why they are there? Obviously Precision autodiscovers something, but then fails to ping it. It would be real nice if we can diagnose this and clean the events up before the demo on monday.