KMF ADVANCE

I.T. Solutions - that put you ahead of the game!

 
  ITIL Explained
  Services
  About KMF
  Publications
  Links
  Contact KMF
  Home Page

KMF Advance
8 / 22 Fletcher Street
Essendon VIC 3040
Australia

Telephone:
(03) 9375 7765

Email:
info@kmfadvance.com

KMF Advance Publication

Published: itSMF Australia Bulletin June 2004 as commentary column:

"In My Opinion - Karen Ferris Speaks Out"


The Lost World of Problem Management


© 2004 Karen Ferris


As a colleague and I recently agreed…..”They just don’t get it do they?”

 What I thought was a clear distinction and fairly well explained in the ITIL Service Support book and other supporting publications seem to be eluding many organisations and individuals. May I say, that this also includes some so called “experts” in the IT Service Management field.

 The confusion over the difference between Incident and Problem Management is pervading the IT industry. The more organisations that I encounter who declare that they are implementing or doing ITIL Problem Management are simply not doing it! What they are doing is Major Incident Management and referring to it as Problem Management. Therefore the “real” Problem Management is not taking place as they believe they are already doing it.

 So let’s take a step back and refresh ourselves on what the difference really is and then explore the possible reasons for the confusion.

 Susana Schwartz quoting John Long (Tivoli technical strategist for IBM) wrote:1

 “…… ITIL’s clarification of an “incident” versus a “problem” clears up confusion about what actions need to take place during which process. An incident occurs at the moment a service request or outage is called into a service center, explains Long. After that call, the company works to get that customer up and running, at which point you close the incident and deploy a separate team to handle the problem, which is defined once the team finds a series of incidents that can be tied together. “That’s when you have your root cause, which becomes a “known error.”””[1]

 Although I don’t 100% agree with the above definition, (e.g. a Problem can be the result of one or more Incidents) the core differentiation is clear. Long stated that ITIL’s clarification cleared up the confusion.  I would have agreed with Long. However, my experience is showing me that the message is not getting through.

Victor Capella in “A Framework for Incident and Problem Management” also acknowledged the issue2.

 “Whilst most organisations develop processes and procedures around Incident Management, many fail to do the same for Problem Management. Often this is due to a lack of clear understanding of the characteristics of the two activities. Incident Management is the simplest activity to understand because it involves putting structure around the response to service interruptions. Because the “squeaky wheel always gets greased,” Incident Management discipline tends to develop quickly. However, there is often less insistence to develop discipline around Problem Management.”

 So why don’t they get it?”

 I have a few thoughts on the issue – none of which to say are the “root cause” of the problem (to pardon a pun)!

 Firstly, the development of process, procedures, tools and organisational and cultural change to implement an effective and efficient, Incident and Problem Management system can be a tremendous undertaking. The changes to organisational culture themselves in order to implement the Problem Management elements can be daunting.

 It could be that organisations do not fully understand the undertaking and therefore shy away from full implementation of Problem Management in it’s true sense and implement something else that is far easier to deal with.

 Secondly, Problem Management (especially proactive Problem Management) relies heavily if not solely on quality Incident Management data. If the Incident Management process is not mature and sufficiently developed to be able to provide detailed and accurate historical data on Incidents, then proactive Problem Management is not going to be able to function. Therefore before embarking on proactive Problem Management organisations have to ensure that the Incident Management function is well established.

 The lack of quality data from Incident Management does not mean that some elements of Problem Management cannot take place. Whilst the Incident Management process is maturing, reactive Problem Management can still take place to investigate the underlying cause of Major Incidents. However, herein lies the caution. If this is the approach taken, do not confuse the handling of Major Incidents whilst they are still within the Incident Management process with the Problem Management process.

 Dealing with Major Incidents until the customer is back up and running is an Incident Management function. The objective is restoration of normal service as quickly as possible. Once this has been done, the Incident can be closed and a Problem record created. The Problem Management team (a separate body of people) then undertake investigation and root cause analysis to identify the Known Error (Problem Control). This is followed by elimination of the Known Error from the infrastructure (Error Control) to ensure that the Incident does not reoccur.

 What seems to be happening in many organisations is that the handling of Major Incidents is passed over to the Problem Management team before the Incident is closed. The Problem Management team then becomes part of the Incident Management process and in effect provides the role of second or third or n-line technical support.

 Another factor adding to this scenario is the creation of a Problem Management team comprised of technical specialists. Incidents are therefore passed to them for more detailed “technical” investigation. In my opinion, this is the incorrect make-up of an effective Problem Management team. Problem Management staff should be technically aware but they do not have to be specialists.

As well as being technically aware, they should have a good knowledge of the business impact of Incidents and Problems. They should be able to facilitate and coordinate “virtual” Problem Management teams that will comprise of technical specialists and business personnel as well as 3rd party suppliers as appropriate. An ability to think “outside the square” is required in order to look at all the various solutions to a Problem, not just the technical ones. Good verbal and written communication skills are essential in addition to excellent analytical and diagnostic skills. Problem Management staff should be able to utilise techniques such as Kepner Tregoe Analysis, Ishikawa Diagrams, and Pareto Charts etc.  They should be able to conduct trend analysis, brainstorming sessions and effectively prioritise root cause efforts.

 A further contribution to the confusion is the lack of management commitment to invest in the Problem Management function. The function is established but the resources and investment needed to make it operate in accordance with the ITIL defined process, is not forthcoming. Therefore the Problem Management team are pulled into the Incident Management process in order to justify their existence.

 As soon as Problem Management is established, it is imperative that Quick Wins are identified so that the return on investment of the establishment of the function can be demonstrated. This can be easily achieved by the identification of a couple of Problems that are costing the organisation a substantial loss in terms of dollars, and the subsequent removal of the underlying cause.

Another barrier to the establishment of true Problem Management is the failure to set aside time to build and maintain a knowledge base that both Incident and Problem Management can utilise in the resolution of Incident and Problems. Where Problem Management during investigation identifies a work-around to an Incident, this should be populated on a Knowledge Base so that the Service Desk and Incident Management staff can resolve the Incident without further recourse to other levels of support. The ability to identify an increase in first-line resolution through the introduction of Problem Management is another quick win and justification for investment in the function.

 So to summarise, organisations have to be cautious of implementing Problem Management in the guise of Incident Management. Not only does this inhibit “real” Problem Management from being established but is also confusing to management and staff as the terminology becomes intertwined and the definitions of Incident and Problem blurred. Organisations embarking on ITIL training and/or recruiting ITIL accredited and experienced personnel will find that this adds to the confusion as their interpretation of Incident and Problem Management will be opposed to that of the organisation. Also, recruitment of Problem Management staff into what is in fact an Incident Management role will not recognise a return on investment of that recruitment. “True” Problem Management staff with the skills mentioned earlier, may not be content with a role (a) for which they did not apply and (b) in which their skills are not being utilised. They may soon become dissatisfied and leave the organisation.

 There is no underestimating the integration and close working that both the Incident and Problem Management processes require but they have to be acknowledged as separate processes with distinct (and often conflicting) objectives.

 My final comment – as I feel it needs airing – is that if you engage the “experts” to assist with Incident and Problem Management implementation, ensure that they can demonstrate a practical track record and really do understand the difference between the processes. I and some of my colleagues agree that there are consultants out there who are confusing their clients because they themselves do not have a grasp on the distinction between the processes.

 


 

[1] 1 Standards Watch: ITIL: Service Management Best Practices – Susana Schwartz – Billing World Today – April 2004

[2]2  A Framework for Incident and Problem Management – Victor Capella – Consulting Manager – INS – April 2003

Karen Ferris is an independent IT Service Management consultant and can be contacted via www.kmfadvance.com

(c) Copyright 2002 KMF Advance Melbourne, Australia