The Secret Diary of Han, Aged 0x29

twitter.com/h4n

Archive for December 2003

Resolution Events 0 or 1

W. is right. Resolution events should not have Severity 0, since that would interfere with some of the rules regarding clean up. The up/down correlation automations actually set the severity of the resolution events to the severity of the matching problem event, so that they get escalated and handled by the service correlator correctly. When the problem is closed and the status is set to 4, the severity is set to 0, so that the closed events are cleared out after a while.

Resolution events that do not match any problem event are cleared out after a little while too.

Written by Han

December 18, 2003 at 19:47

Posted in Uncategorized

Diskspace = 94%

Diskspace over a certain threshold should be a Severity=4 (Critical) problem. The distinction should be that if something is not working (like a ping failure) the severity should be 5 (Fatal), and when something has a problem but is still working (like low diskspace), it is 4 (Critical). Is this something that we should configure in the trapd probe rules file?

Written by Han

December 14, 2003 at 17:54

Posted in Uncategorized

Lost worklog entries

D made a backup of the worklog database last friday evening. This backup does not include any additions that were made in TXS on friday. Since the SAN is down (dead battery) after the power cycle this weekend, we had to revert to the backup. When the SAN comes back up, we can reenter those entries.

Written by Han

December 8, 2003 at 07:38

Posted in Uncategorized

Sample Rate for AA and AB automations

The AA_Smart24Raise and AB_Smart24OpenedNotification used to have a sample rate of 5, whereas now that has been changed to 35 and 45 respectively. I understand that keeping everything at 5 may cause performance problems, but these timings seem quite long. Maybe we can shorten them a somewhat (say to 11 and 13, to name just two prime numbers), so that forwarding events is still reasonably fast. Any ideas?

Written by Han

December 7, 2003 at 15:59

Posted in Uncategorized

Catchall_Smart24Cat

The CatchAll_Smart24Cat automation sets a default category in the alert if it is not present. However, am I correct to assume that an event may be raised before the Catchall_Smart24 automation kicks in, if the AA_Smart24Raise automation just happens to start earlier? In that case, the helpdesk won’t see a category, and will assume a default anyway. So I was wondering how useful the automation actually is.

But I may be missing something

Written by Han

December 7, 2003 at 15:56

Posted in Uncategorized

Re: Sourceforge

Hm…does this mean that once the keys are set for passwordless cvs, there is no way to back out of that without the manual intervention?

you can configure this on the client. If you don’t want passwordless cvs, you can remove the ssh keys locally

The user account in sf is a restricted shell, and logs the person right back out (for me, at least).

If the need arises, ask C for the password of the admin account. Or ask him to do it…

Written by Han

December 4, 2003 at 11:21

Posted in Uncategorized

Fixing sourceforge problems

There are two common problems with Sourceforge that cause it to malfunction.

  1. The first problem occurs when someone checks in some code and deletes the Tracker: line from the Commit log message. This results in the project home page and the CVS commit logger page to be not accessible. the fix is to go into the database server and login to, or su to, the oracle account. set the ORACLE_SID to sfeedata, go into sqlplus sfee/sourceforge and update the scm_int_commit table. Each record in this table represents a CVS commit. The commit log message is stored as a CLOB in commit_msg. You need to find the offending record(s) by selecting the records around the time where the error started occurring. Each record has timestamp field in Unix time that can be used for it. The offending records have a commit_msg that shows up as a single question mark. Update the offending record(s) by issueing “update scm_int_commit set commit_msg=NULL where timestamp=[timestamp of offending record]; commit;There is no need to restart sourceforge after this. Sourceforge is working on a fix, so we should not need this anymore after they come up with it.
  2. The second problem is related to ssh keys. You can add an SSH key to Sourceforge in order to avoid having to type your password on every interaction with sourceforge. Sourceforge allows you to paste in that key from your account maintenance page. If you would try to update with an empty string, for instance if the key does not work and you wish to delete it, it will cause the project homepage and the tracker pages to malfunction. In order to fix this problem, the key needs to be deleted both from the oracle database and the filesystem on the sourceforge server, since they both sync each other. In order to delete from the files from the sourceforge server, ssh into it and remove the .ssh directory from the users homepage (/sourceforge/home/users/…). Then go into the database , and update the corresponding record in the users table by issueing update users set authorized_keys=EMPTY_CLOB() where user_name=[username]; commit; Finally, restart sourceforge /sourceforge/etc/init.d/sfos.init stop and /sourceforge/etc/init.d/sfos.init start

Written by Han

December 3, 2003 at 10:17

Posted in Uncategorized