Archive for April 2005
On friday the QA team presented their test results. They do this in the form of a seriously looking report. The report includes statistics and categorization of bugs. They then draw a lot of conclusions from those statistics and from the experiences they had.
The quality of the looks of their report is high. The quality of the content, on the other hand, is severely lacking.
What is most disturbing about this is that the QA team is not acting as a good team player. Every little detail, even if they are not bugs but translation, or configuration problems get blown up and gain a lot of visibility outside the tools team where people have a hard time to see that this is not really a big issue. This makes the members of the tools team very wary to work with the QA team. Instead the two teams should work together as one team. The tools team has no qualms about the QA team tracking bugs. But the right perspective should be given, so that people outside of our teams can form a proper opinion.
Here is a partial list of criticisms:
- There were 15 registered bugs in Sourceforge. However, there were 27 in the test report. This may be due to the QA team counting one bug in SF as multiple bugs. However, the report does not list any bug numbers, so it is impossible to analyze that further.
- The report complains about the slowness of bug fixing. However, most bugs were entered in Sourceforge on or after March 24th. This made it impossible for me to keep track of them, since I did not know they existed. More on this below.
- A bit of analysis of the 15 bugs in sourceforge by me, reveals that there were no high impact bugs, and 5 low impact bugs. The other 10 were either environment settings that were wrong, translations in the UI that had to be changed, or they were not bugs. This sheds a completely different light on the discovered bugs than what is suggested by reading the report
- The categorization of the “bugs” among the parts is not correct. Most of the “real” bugs (i.e. not translation errors etc) that were attributed to the Operator Portal, are really in the integration framework.
- Hilariously, the report said that the Operator Portal could not be deployed, because there were 4 outstanding bugs. In reality there was only 1, which was there because we just found it before the meeting, and it was clear that this was a small thing that would quickly be solved. The other 3 were fixed, but their retest failed because of an environment problem (proxy server setting in Siebel, combined with a hung proxy server). They knew that status too.
- Even if those bugs were all there, their impact would have been very low. It was clear the QA team did not understand that
- 10 bugs were said to be introduced in earlier cycles. They mentioned that too me a few hours before the meeting and I went through those bugs with 2 of the QA teams members. All of them were environment problems. None of them were real bugs. Despite that, they never changed their report and presented it as if our discussion never happened.
- Environment problems, translation preferences, etc are all brought up as bugs. There is a classification of bugs, but since the details are absent from the reports, I cannot analyze. The report speaks of 10 bugs that are coding errors, whereas I only find 5 (and I am including doubtful cases). In addition, there is no severity in the severity of bugs. At least there should be something like no impact, small impact, large impact
- The report lists and tracks the number of bugs that “should be found in development”. Apart from the fact that this is about as useful as saying “no bugs should be introduced ever”, they also include environment errors in that category, which is very strange.
- The report is full of qualitative statements like “the quality of the Operator Portal is low”. Such strong statements cannot be made without defining what is considered low quality and some pretty strong supporting evidence. It is also not clear to me how a component that has not caused any significant problems in 6 months in production can be said to be of “low” quality
- There are the usual complaints about “bad version management” of the staging system. This is due to a single mistake: At one time, by accident a wrong version of a minor tool was deployed but this was quickly corrected. The QA team has to learn that it is not a simple word processor that they are testing but a complex OSS.
Since this kind of report can start leading its own life, and since the QA team is all to eager to present this to upper management (they already sent it to Upper Pointy Haired Boss-san, since Middle, even pointier haired boss-san is sick), it is important that we counter it with a report of our own.
The lateness of bugfixes in operator portal was detailed in the report in a table with bug numbers. So here is my analysis:
|bug||entered||fixed||time (QA team)||SF entry||comment|
|905||3/7||3/29||16 days||3/24||translation miss (”first occurence” in english)|
|740||3/7||3/23||12 days||this is a user story, not a bug!|
|906||3/15||3/29||11 days||3/24||Error message|
|900||3/3||3/29||18 days||3/24||translation miss (”Customer telephone number” in english”)|
|901||3/14||1/4||3/24||bug was fixed quickly, but it was found out on 4/1 that the bug fix fixed it in the wrong way. It really was a configuration problem in Siebel that was correct in Production but wrong in Staging|
|902||3/22||3/29||6 days||3/24||change of error message text|
|903||3/22||4/1||3/24||fixed on 3/30, but check failed due to environment problem (proxy server) on 3/31|
|904||3/23||4/1||3/24||fixed on 3/30, but check failed due to environment problem (proxy server) on 3/31|
|907||3/25||4/1||3/25||not a bug. Old event data was present in netcool|
|908||3/25||3/31||5 days||3/25||remove text box and button when no valid alerts present. Minor issue without impact|
It was not possible for us to track many of the bugs until 3/24 because they were not in Sourceforge before that date, so follow up on small issues like translation mistakes got delayed. In addition, there are some issues with this table, like the reopening of bug 901, the inclusion of a story, and the failure of tests due to a different reason. These are not delays in getting bugs fixed.
The deployment of cycle 10 was presented in the CAB today, and it was approved. This means deployment will go on as scheduled on thursday evening, from 8 pm.
For staging we already told the QA team that we are going to wipe out all alerts when we deploy cycle 11
For production probably a combination of 1 and 2 is possible in production. At any stage the number of alerts is quite small, with the primary view empty or almost empty. Most alerts are repeat alerts that can be deleted without a consequence.