Archive for May 2004
I modified the primary view today to filter out certain events:
- Resolution events
- Cleared events, unless they have an open ticket
- Indeterminate events, since they will be used for harmless notifications, etc.
- Not yet done, but we should also get rid of the internal (security watch, probe / gateway (dis)connected) events from Micromuse, that do not dedupe in the first place
A couple of tools were added / modified
- The Ping tool now checks the format of AlertKey using a regex. If it looks like an IP, it tries to ping it. If it does n’t, it tries to ping Node. Useful for IPInterfaces, where the IP to be pinged resides in the AlertKey, not the node
- A topoviz tool was added that opens a topoviz window for the current node. That is, it does that after logging in. Don’t forget to logout, since topoviz licenses are easy to exhaust. Since there is not logout from the topoviz window, this is not as easy as it sounds
- David added a process table tool. Works great, but for now only for Windows machines
- A simple (mockup-level) HTTP Get was added. This should really go through a portal tools, so that the result gets archived
In addition, all tools were limited to work on a single event only, in order to prevent hundreds of windows being opened simultaneously.
I changed the netcool right-click and alert menu around so that it more closely reflects the workflow. All unused menu items that were configurable were moved to a submenu called “unused”. The tools submenu now only contains diagnostic tools. Open Ticket, Remove, and Move to Pending / Primary were moved to the top-level menu.
Today we got far enough with our integration that manual ticket creation now works. Lots of small problems were solved all over (From Siebel to Portal to Netcool) to make this work. What is more, creating a single ticket from multiple alerts also works flawlessly.
Check it out!
I checked in the integration code changes described in the last couple of posts. Now that the changes have been made and the build is working again, we can move ahead and start getting it to work on monday. The tickettool is not updated yet with respect to these changes, but the amount of work to do that should be minimal. Since none of this could be tested in isolation, it will take a bit of time and coordination to get it all to work properly.
Due to external requirements, the event flow has changed somewhat. Fatal alerts in netcool are still automatically forwarded to the dispatcher, as before. The dispatcher then only sends them to the service correlator, not to the helpdesk. In case of a service outage, the correlator injects new events, but these are now routed back to Netcool. Therefore no tickets are created, even though events are auto forwarded
When operators create tickets, they can still be forwarded using the usual socket gateway / socket gateway listener mechanism. However, the usual route would now be through the tickettool. These events are then not routed to the service correlator, but only to the helpdesk. Events from Siebel are routed back to the original Netcool instance.
These changes led to a renaming of the TicketAction element in events to Intent. The autoforwarded Fatal down event from netcool has an Intent value of “ComponentDown”. The Dispatcher then routes based on that value to the ServiceCorrelator. Autoforwarded Fatal resolution events have a value of “ComponentUp”. Events injected by the Service Correlator have values of “ServiceDown” or “ServiceUp” and are routed back to the correct netcool based on the value of “MonitoringSystem”. Manually forwarded events intended to create a ticket have, as before a value of “Open”, and are routed to Siebel based on that value. Siebel responds with an “opened” notification event, as before, which goes to Netcool based on the MonitoringSystem element. Finally, a resolution event correlated with a problem event that has a ticket open, results in an event with Intent of “Close”, which is routed to Siebel based on that value. Siebel, now, does not actually close the ticket, but takes it as a nofication that an alert related to a ticket has closed. When someone closes a Siebel ticket, Siebel injects a “Closed” event, as before, that is routed back to the correct netcool instance, again based on the value for MonitoringSystem.
Here are the new rules:
<listeners> <listener url="http://ip.ad.dr.ess/services" rule="Event[MonitoringSystem='Micromuse']" modifier="false" listenerType="SoapLite" /> <listener url="http://ip.ad.dr.ess/services" rule="Event[MonitoringSystem='Tools']" modifier="false" listenerType="SoapLite" /> <listener url="http://localhost/ServiceCorrelator/servicecorrelator.asmx" rule="Event[Severity='Fatal' and (Intent='ComponentDown' or Intent='ComponentUp')]" modifier="false" /> <listener url="http://localhost/stubhelpdesk/stubhelpdesk.asmx" rule="Event[Intent='Open' or Intent='Close']" modifier="false" /> </listeners>
By adding rules back in for the AddCustomer service, and autoforwarding all events to it, we can/could at a slightly later stage, populate the incoming events with server and customer data from opsware.
Smart24 Events were changed a bit due to the integration update after implementing the EDS requirements. Here are the most important changes:
- TicketAction changed to Intent. TicketAction values were “Open”, “Close”, “Opened” and “Closed”. In addition to these the following are now also supported: “ComponentDown”, “ComponentUp”, “ServiceDown” and “ServiceUp”. Since the intent of an event is now not always opening a ticket, the name was changed.
- CreatedBy was changed to MonitoringSystem. This change was made because the sole intention of CreatedBy is to route events back to the correct Netcool instance, in case of multiple netcool instances, in case of multiple datacenters. This is now reflected in the name
- CustomerId became Customer. This is more logical, since the value is the URL to an XML representation of the customer, not the ID of the customer
- ServiceId was removed, since it overlapped exactly with CustomerService. CustomerService also contains a URL, now to an XML rep of the customer service
- Subsource was removed
- Status was removed
- TicketReason was removed
- DataCenter was added
- ServerLifeCycle was added. This is the server life cycle status according to Opsware. Not populated yet.
- Group was added. This indicates the Group (in terms of human personnel) that a ticket is assigned to. Since this assignment is made in Netcool, according to EDS reqs, it is carried in events
- TicketUrl was added. This carries the URL of the ticket. the TicketNumber element now carries the ticket number, not the URL