|
Published:
itSMF Australia Bulletin June 2004 as commentary column:
"In My Opinion - Karen Ferris Speaks Out"
The Lost
World of Problem Management
© 2004
Karen Ferris
As a colleague and I recently agreed…..”They just don’t get it do they?”
What I thought was a clear distinction and fairly well explained in the ITIL
Service Support book and other supporting publications seem to be eluding many
organisations and individuals. May I say, that this also includes some so called
“experts” in the IT Service Management field.
The confusion over the difference between Incident and Problem Management is
pervading the IT industry. The more organisations that I encounter who declare
that they are implementing or doing ITIL Problem Management are simply not doing
it! What they are doing is Major Incident Management and referring to it as
Problem Management. Therefore the “real” Problem Management is not taking place
as they believe they are already doing it.
So let’s take a step back and refresh ourselves on what the difference really
is and then explore the possible reasons for the confusion.
Susana Schwartz quoting John Long (Tivoli technical strategist for IBM) wrote:1
“…… ITIL’s clarification of an “incident” versus a “problem” clears up
confusion about what actions need to take place during which process. An
incident occurs at the moment a service request or outage is called into a
service center, explains Long. After that call, the company works to get that
customer up and running, at which point you close the incident and deploy a
separate team to handle the problem, which is defined once the team finds a
series of incidents that can be tied together. “That’s when you have your root
cause, which becomes a “known error.”””
Although I don’t 100% agree with the above definition, (e.g. a Problem can be
the result of one or more Incidents) the core differentiation is clear.
Long stated that ITIL’s clarification cleared up the confusion. I would have
agreed with Long. However, my experience is showing me that the message is not
getting through.
Victor Capella in “A Framework for Incident and Problem Management” also
acknowledged the issue.
“Whilst most organisations develop processes and procedures around Incident
Management, many fail to do the same for Problem Management. Often this is due
to a lack of clear understanding of the characteristics of the two activities.
Incident Management is the simplest activity to understand because it involves
putting structure around the response to service interruptions. Because the
“squeaky wheel always gets greased,” Incident Management discipline tends to
develop quickly. However, there is often less insistence to develop discipline
around Problem Management.”
So why don’t they get it?”
I have a few thoughts on the issue – none of which to say are the “root cause”
of the problem (to pardon a pun)!
Firstly, the development of process, procedures, tools and organisational and
cultural change to implement an effective and efficient, Incident and Problem
Management system can be a tremendous undertaking. The changes to organisational
culture themselves in order to implement the Problem Management elements can be
daunting.
It could be that organisations do not fully understand the undertaking and
therefore shy away from full implementation of Problem Management in it’s true
sense and implement something else that is far easier to deal with.
Secondly, Problem Management (especially proactive Problem Management) relies
heavily if not solely on quality Incident Management data. If the Incident
Management process is not mature and sufficiently developed to be able to
provide detailed and accurate historical data on Incidents, then proactive
Problem Management is not going to be able to function. Therefore before
embarking on proactive Problem Management organisations have to ensure that the
Incident Management function is well established.
The lack of quality data from Incident Management does not mean that some
elements of Problem Management cannot take place. Whilst the Incident Management
process is maturing, reactive Problem Management can still take place to
investigate the underlying cause of Major Incidents. However, herein lies the
caution. If this is the approach taken, do not confuse the handling of Major
Incidents whilst they are still within the Incident Management process with the
Problem Management process.
Dealing with Major Incidents until the customer is back up and running is an
Incident Management function. The objective is restoration of normal service as
quickly as possible. Once this has been done, the Incident can be closed and a
Problem record created. The Problem Management team (a separate body of people)
then undertake investigation and root cause analysis to identify the Known Error
(Problem Control). This is followed by elimination of the Known Error from the
infrastructure (Error Control) to ensure that the Incident does not reoccur.
What seems to be happening in many organisations is that the handling of Major
Incidents is passed over to the Problem Management team before the Incident is
closed. The Problem Management team then becomes part of the Incident Management
process and in effect provides the role of second or third or n-line technical
support.
Another factor adding to this scenario is the creation of a Problem Management
team comprised of technical specialists. Incidents are therefore passed to them
for more detailed “technical” investigation. In my opinion, this is the
incorrect make-up of an effective Problem Management team. Problem Management
staff should be technically aware but they do not have to be specialists.
As well as being technically aware, they should have a good knowledge of the
business impact of Incidents and Problems. They should be able to facilitate and
coordinate “virtual” Problem Management teams that will comprise of technical
specialists and business personnel as well as 3rd party suppliers as
appropriate. An ability to think “outside the square” is required in order to
look at all the various solutions to a Problem, not just the technical ones.
Good verbal and written communication skills are essential in addition to
excellent analytical and diagnostic skills. Problem Management staff should be
able to utilise techniques such as Kepner Tregoe Analysis, Ishikawa Diagrams,
and Pareto Charts etc. They should be able to conduct trend analysis,
brainstorming sessions and effectively prioritise root cause efforts.
A further contribution to the confusion is the lack of management commitment to
invest in the Problem Management function. The function is established but the
resources and investment needed to make it operate in accordance with the ITIL
defined process, is not forthcoming. Therefore the Problem Management team are
pulled into the Incident Management process in order to justify their existence.
As soon as Problem Management is established, it is imperative that Quick Wins
are identified so that the return on investment of the establishment of the
function can be demonstrated. This can be easily achieved by the identification
of a couple of Problems that are costing the organisation a substantial loss in
terms of dollars, and the subsequent removal of the underlying cause.
Another barrier to the establishment of true Problem Management is the failure
to set aside time to build and maintain a knowledge base that both Incident and
Problem Management can utilise in the resolution of Incident and Problems. Where
Problem Management during investigation identifies a work-around to an Incident,
this should be populated on a Knowledge Base so that the Service Desk and
Incident Management staff can resolve the Incident without further recourse to
other levels of support. The ability to identify an increase in first-line
resolution through the introduction of Problem Management is another quick win
and justification for investment in the function.
So to summarise, organisations have to be cautious of implementing Problem
Management in the guise of Incident Management. Not only does this inhibit
“real” Problem Management from being established but is also confusing to
management and staff as the terminology becomes intertwined and the definitions
of Incident and Problem blurred. Organisations embarking on ITIL training and/or
recruiting ITIL accredited and experienced personnel will find that this adds to
the confusion as their interpretation of Incident and Problem Management will be
opposed to that of the organisation. Also, recruitment of Problem Management
staff into what is in fact an Incident Management role will not recognise a
return on investment of that recruitment. “True” Problem Management staff with
the skills mentioned earlier, may not be content with a role (a) for which they
did not apply and (b) in which their skills are not being utilised. They may
soon become dissatisfied and leave the organisation.
There is no underestimating the integration and close working that both the
Incident and Problem Management processes require but they have to be
acknowledged as separate processes with distinct (and often conflicting)
objectives.
My final comment – as I feel it needs airing – is that if you engage the
“experts” to assist with Incident and Problem Management implementation, ensure
that they can demonstrate a practical track record and really do understand the
difference between the processes. I and some of my colleagues agree that there
are consultants out there who are confusing their clients because they
themselves do not have a grasp on the distinction between the processes.
Karen Ferris is an independent IT Service Management consultant and can be
contacted via www.kmfadvance.com
|