Serio Blog

Wednesday, 31 Jan 2007

This blog entry is a follow up to our previous post about KPIs for Incident Management. The subject of this post is where these reports about KPIs can be located within the Serio tool.

You’ll need access to SerioReports. Remember this is a part of the tool you need to install – if you can’t see it you should install it (provided you have sufficient licenses to do so). Login to SerioReports, and open a Reports Explorer from the File menu.

Incident Counts

Theses are mainly clustered under the heading 'Logged'. These reports focus on Inputs (as defined in our Service Desk/Helpdesk Metrics white paper), and are both graphical and text based. I’ll pick some out and talk about them individually.

IL17 – Breaks down Incidents by Problem Area Category and Problem Area, with a percentage for each. Useful for understanding the spread of your Incidents.

IL7 – A useful grid that links up the Type of Incident (Fault, Job Request etc) with the Category (Printer, Spreadsheet etc).

IL14 – This report tells you when Incidents are logged during the data. Usually you’ll see two ‘bell’ curves – one in the morning, and one in the early afternoon, as this is typically when most Incidents are reported.

IL22 – This is a graph that shows Incidents by Problem Area. The most used categories are at the top, and this report is useful in weeding out unused Problem Area Categories from your system.

There are around 40 or so in this group, each offering different ways of looking at inputs.

You’ll also find some interesting graphs within SerioClient, under Tools/Performance. See ‘Days logged for Active Incidents’ and ‘Incidents logged and Resolved’. This later report shows you a week on week view of both tickets logged and tickets resolved – which hopefully (kind-of) match up over the piece. If not, maybe read this about backlogs.

First Time Fixes (FTF)

Within SerioReports, see the ‘First Time Fix’ report AGT14. This is grouped under ‘Agent Performance’. This report has recently been upgraded (about 2 months ago) and is now excellent (though I say so myself as the author of the report). It shows overall FTF, and FTF broken down by individual Agents and Teams plus other good stuff. If you want to get the latest version of SerioReports you will need to be using Serio 4.6 or later.

I’ll complete this blog entry tomorrow by looking at the remaining KPIs.

Monday, 29 Jan 2007

Regular readers will know that I have been posting recently about Incident Management, the last of which was posted here (there are others also).

This post will cover the subject of KPIs for Incident Management, and offer some practical suggestions for you. I’m going to keep this post general, and probably write a Serio-specific post later that tells you where the reports are (this data is available however from SerioReports).

What follows is not a definitive list – nor is the 'best' or 'only' list of KPIs. These are just some suggestions for your own Incident reporting repertoire, and is targeted firmly at Incident Managers who need to prepare management reports.

Incident counts

The total number of Incidents logged. You can cut this with a month-on-month trend going back over the previous quarter, or break it down by Category, or Priority, or Impact. What you will be interested in showing is the number (is it up or down, or constant?) and how severe the Incidents have been.

First time fixes

I’ve written about this a lot before, but for the latecomers it is simply a measure of how many customers reporting Incidents get an immediate resolution to the problem – before the call ends. This statistic is telephone based – it has no meaning when Incidents are logged via a web portal, and almost no meaning with Incidents reported by email. However, as the telephone continues to be an important medium for the Helpdesk of Service Desk (the percentage still seems to be over 50%) this continues to be an important statistic.

Resolutions by the Helpdesk or Service Desk

Whereas first time fixes relate to immediate resolution, this KPI simply refers to resolutions made by the Helpdesk/Service Desk without assignment to specialist teams. If this is a high (or rising) figure it suggests a good degree of competence within the group.

Percentage of Incidents handled within their SLA target

Whilst this falls into the remit of the Service Level Manager, it’s still a useful KPI for Incident Management. Typically you’ll be looking at the speed of response, and of resolution. Like the Incident Counts figures, it’s sometimes useful to break this figure down into different groups – such as Impact or Category.

Spread of Resolution Time

This is where you take Incidents and examine how long the resolution time, examining the mean resolution time, and deviations from the mean. Almost always it’s better to use a graph or histogram to express this, and again is a useful indicator of quality. 

Friday, 26 Jan 2007

I’ve been blogging previously about Incident Management, and no discussion about Incident Management would be complete without mentioning Major Incidents.

First of all, let me offer a definition: A Major Incident is any Incident that has a significant or substantial effect on part of all of the business.

Leaving aside the issue about VIPs (a stuck keyboard belonging to your Chief Executive causing panic on the help desk), we can say that Major Incidents usually affect significant numbers of employees, and involve important enterprise level services.

So how do we manage Major Incidents? The answer is ‘we do all the stuff that we normally do’ – plus we do some other things. So we make sure that we log the Incident properly, use the CMDB as required, and make the best initial assignment that we can. If you are a Serio user, presumably you’ve set up a Broadcast Alert. Broadcast Alerts were specifically designed for major Incidents, and can be used to send emails about important Incidents to lots of people. You might also use the Serio Text Message Gateway to send text (SMS) messages.

Coming back to the ‘other things’ I mentioned above, this is where your Incident Manager (you have one, right?) takes a lead. What follows could, with a little effort, form the basis of a Major Incident Procedure.

  1. If you can, make a rough estimate of how long the Incident will last, or more accurately, how long the missing service will be unavailable. You might be reluctant to do this, as people have a tendency to hold you to rough estimates at times of stress, but you should do it anyway.
  2. Inform the key stakeholders. By this I mean do more than send them an automated email, use the CMDB to identify the affected parties, and let them know about the Incident – don’t assume they know. Give them your estimate from 1. above. This way, they will know if it is worth starting manual procedures, and it will help them deal with their customers.
  3. If you are a Serio user, post the Incident to the Service Status website, as that is what it is for. Post updates on the Incident here during the day, your customers will appreciate it.
  4.  Inform your Problem Manager (you’ve got one, right?). I’ve blogged about Problem Management before here and here (and other places), and we have a Problem Management white paper on that subject for download.
  5. Once the Incident is resolved, perform a review. Analyse the Incident from different perspectives, which should include:
  • Could the Incident have been avoided in the first place?
  • What was the estimated cost to the business of the Incident?
  • How well did we perform in restoring the missing service to users?
  • Did we communicate effectively, both between ourselves and our customers?
  • How well did our internal documentation perform – for instance, our recovery documentation?
  1. Report your findings clearly with recommendations for the future.

Wednesday, 24 Jan 2007

This post will be on the topic of reports you can run in SerioReports – I am going to pick a few and talk about them in detail.

OK, so where is SerioReports? If you don’t have an icon for it on your desktop, remember that it’s an application and needs to be installed (though please remember that it is licensed, so you may need to purchases additional licenses before doing so). Please remember that SerioReports also has it’s own help file distributed with the product.

We’ve already got quite a bit to say about metrics and reporting. I have blogged about it here, and there is an excellent white paper by my colleague George Ritchie entitled ‘Service Desk Metrics – Getting Started’. If you have not read this white paper yet, and you are interested in reporting, I urge you to do so now.

What I will do in this post is to select a report that I find interesting, and to talk about it and explain about the data it presents and what it displays. I’ll return to this topic next time I post.

I’ll start with SLA Analysis reports – this is where most (but not all) of the service level related reports reside. There are approximately 22 of these – some graphical, some textual. If you can’t find these reports, then log-in to SerioReports, select ‘New Report Explorer’, and then in the tree presented expand the section ‘SLA Analysis’.

I’ll start with a simple graph-based report – SLA Resolutions/Company (SLA16). If you supply a start date, and end date, and the SLA on which the report should be run, you’ll get a histogram that shows you the following:

For each Company on whose behalf you have resolved Incidents, the percentage of resolutions on time. In order to help you make better sense of the percentages, the report also shows you the number of Incidents on which the percentage is based. This is important because you might have a Company that shows ‘0% on time’ – but for just one Incident resolved between your start date and end date.

So who might use SLA16 and why?

Most likely to run this report is a Service Level Manager, or whoever is responsible for overall IT service levels within your organisation (as George Ritchie would say, you’ve got someone responsible for that, right?). You’d run the report as a safety check, and to get behind overall SLA statistics, to make sure that for each Company or Department you serve the service levels are within acceptable limits, or as part of an attempt to identify bottlenecks for further investigation.

Please also remember you can save these reports to PDF easily. Simply print them using the PDF printer installed with SerioReports.

You can also save reports to your favourites list, which means you don't need to search for them next time you want to run them.

Monday, 22 Jan 2007

I’ve posted here, here and here some introductory information about Incident Management. This post will be on the subject of success, and success factors, if you are introducing Incident Management into your Helpdesk or Service Desk. I’m going to assume you are moving from a 'zero' position, starting to embrace ITSM.

  1. Culture. Culture is really important, and being specific I mean the culture of work and service present in your helpdesk or service desk. I’ll state straight away that ITSM is, in my view, not about ‘making the life of IT staff easier’ particularly at the start. The focus is on improved customer and business service, and about higher standards generally in IT service delivery.

As an example, I’ve heard objections to Incident Management such as ‘I don’t have time to log all Incidents’, ‘Choosing a priority and category takes too long, I just want to capture a description’ or ‘This data is of no use anyway’. I'd take these as signs of gentle resistance.

If you are reading this, and you are the IT manager, or Service Delivery Manager, you certainly have a significant role to play in changing this culture (though it’s beyond the scope of this post to discuss strategies for doing this).

  1. A Knowledgebase. I know I said I was discussing this from the perspective of the 'zero' position, but it’s usually possible to make a start on writing down your most common problems and their solutions, and in doing so you’ll make a significant contribution to productivity. There are also a number of very good commercially available sources of knowledgebase content you can use (though choose wisely, and don't overwhelm staff with irrelevant content).

Make development part of your Incident Management process by encouraging engineers to suggest articles. If you are a Serio user, you can include ‘nomination’ to knowledgebase content as part of your resolution process. 

  1. A Configuration Management Database (CMDB). Yes, I know – if you are in the 'zero' position you won’t have one. They really do help and are important, so view the period whilst your processes mature as an interim period until you have a CMDB at your disposal. Whilst you are in this interim period, sometimes using network-based tools such as the Serio Inventory Agent/Workstation explorer can help.
  2. You need an ITSM tool, and you need to have a reasonable idea of how to use it. I have seen people trying to use spreadsheets and it never works in my opinion.
  3. If you are a manager, set yourself some real, tangible, possible (i.e., deliverable) objectives for the first few months of your Incident Management process. It might be ‘reduce Incident resolution times by 20% over 3 months’ or more simply ‘reduce Incident numbers by 10%’ or ‘improve measured customer satisfaction over the period’. Whatever it is, set yourself some goals. 

Friday, 19 Jan 2007

This post is an addendum to my earlier post on The Incident Life Cycle.

That post discusses a status value called the that is ‘signpost about where we are with an Incident’. This post will talk about where that is in Serio, and how help desks and service desks can use it.

Firstly this is referred to in Serio as Agent Status. There are two types of status value: A and B. These are functionally equivalent – rather than giving you a single status value to use, we gave you two, called Agent Status A and Agent Status B.

In terms of managing Incidents, you can use Agent Status in the following ways.

You can display the Agent Status in your Incident list. Imagine that you had 10 Active Incidents. If you wanted to find out where you are in their resolution process without Agent Status, you’d have to examine the Actions on each one individually. With Agent Status, you might have something like:

  • In progress
  • Awaiting parts
  • On Hold
  • In progress
  • Awaiting Purchase Authorisation
  • With External Supplier
  • In Progress
  • Unstarted
  • Unstarted
  • On hold

You can use the Agent Status to select Incidents to work on by creating a simple Query. For example, ‘show me all On hold Incidents assigned to my Team’

You can access a status report quickly and easily through SerioClient. This shows a pie chart comprised on Agent Status data. Simply open a ‘Performance’ chapter and select ‘Agent Status A/B’ distribution.

Setting Agent Status is done through Actions, and is something your own Serio Administrator must configure. There are two basic approaches taken.

The first on these involves setting the Agent Status incidentally. For example, you might take an Action called ‘Assign to Widgets Inc’ where Widgets Inc is a maintenance supplier, Your Action, as part of this, might change the Agent Status value to ‘With External Supplier’.

The second way is more direct, by having Actions solely designed to change the Agent Status value and very little else. For example, ‘Place Incident On Hold’ might be an example Action that does exactly what it says.

For more information about Agent Status and Actions, consult the HowTo guide, the main resource file distributed with Serio products.

Tuesday, 16 Jan 2007

Commentator Rob asks for some suggestions for things to discuss in a forthcoming interview for a Service Desk Manager role. It was such an interesting topic I could not resist.

I recall an interview I had back in the 1990’s with a London bank, for the role of IT Service Delivery manager. It started off badly – I arrived promptly, but then the interviewer turned up 30 minutes later and did not apologise for his tardiness.

Almost immediately the interview turned into an ITIL/ITSM question and answer session, along these lines. My qualifications were on my CV, and the actual award documents were in my briefcase – he never asked to see them.

‘What is a CMDB’ he asked, starting almost as soon as I had sat down.

I gave what I though was a good answer in my own words, explaining the role it plays for activities such as Change Management. He then started to split hairs, and something became apparent: he had learned to recite a lot of the text from the Service Support book like a parrot, and allowed no interpretation except for his own very literal view.

I had just finished (successfully) a large ITSM project, but this hardly featured in the interview. I left pitying the poor person who would get the job.

To come back to the point of this post, and to Rob’s question, avoid this scenario at all costs. Check the qualifications of your candidates carefully, and then assume they understand the relationship between the CMDB and Change Management. Surely what you are interested in is how they can use this kind of knowledge to deliver business benefits.

I recall another interview at a now-defunct manufacturing company. The person doing the interview here was someone who turned out to be a very capable boss. At the end of the interview I asked what he had thought of some of my answers, to which he replied ‘I was much more interested in how your reasoning process worked, and how well you could communicate with me’.

Some of the things he asked in the interview were topics I’ve touched on in this blog in recent months. I can remember the questions clearly because after 2 years I moved to another post, and participated in the interviews for my replacement.

‘You’ve got 30 IT staff, and I’m going to tell you some are wonderful and some are.. not wonderful. How do you find out which is which?’

‘Once you’ve sorted the ‘not wonderful’ into a group, what do you do?’ (His favourite response to this was from a guy who said ‘make them walk the plank, of course’).

‘The expectations of the business are not being met, where do you start?’

‘We’ve got 1000 open tickets. What would your action plan be?’

‘Your service teams have a culture of blame and recrimination. Tell me would you would do over 6 months to improve this situation.’ (On this one, I can tell you he was not looking for anything along the lines of nights-out in the pub, or fatuous team-building exercises).

‘You have been asked to produce an IT management summary report for our esteemed proprietor. What would you include?’

Monday, 15 Jan 2007

This is a continuation of the earlier ‘Incident Life Cycle’ post.

I’ll start by talking about Escalation. This is a term that seems to mean different things for different people. For some, it means the Helpdesk or Service Desk assigning an Incident to a more expert team (or third party supplier) when the nature of the Incident and the skills in our specialist groups dictate that we must do so.

For some, it means adjusting the priority of the Incident (usually upwards).

For others, it means changing the Incident and alerting staff as it becomes possible the resolution will be late.

Fortunately ITIL provides us with some useful definitions – Functional Escalation, and Hierarchical Escalation.

  • Functional Escalation refers the process of assigning an Incident from one team to another based on the skills required to resolve the Incident – for example, assigning an issue with a database backup to the DBA team.
  • Hierarchical Escalation refers to a process whereby we take action to avert the resolution of an Incident being unsatisfactory or late.

These two types of Escalation are not mutually exclusive: you may, as part of your Incident Management process, do both.

A number of strategies are used in Functional Escalation. Some of those I’ve encountered being used successfully include

  • An ‘open’ system, whereby any Agent wishing to reassign an Incident assigned to them can do so. This assumes that staff work diligently on Incidents, and will not needlessly re-assign Incidents for no other reason that they want to focus on projects of most interest to them. Generally ‘open’ approaches like this work best in smaller groups, where expertise and responsibility is very clearly defined.
  • A ‘refer and request’ system, whereby any of the specialist teams who wish to have a reassignment can reassign the Incident back on the Helpdesk of Service Desk only, with a request for assignment to another teams (for instance, it turns out after diagnosis that a different team needs to become involved). This stops a ‘pass the hot potato’ mentality developing, helps the Service Desk team keep involved, and is one of my preferred approaches.

The ‘open’ and ‘refer and request’ methods of Escalation are by no means the only possible approaches, but they are two that are used very frequently in Incident Management.

Please note that both of the approaches I’ve discussed above are supported by Serio, as are other variations.

I’ll continue this thread in future posts this week, and also discuss Hierarchical Escalation.

Wednesday, 10 Jan 2007

This is a follow-on post from Introducing Incident Management.

This blog post is going to talk about what ITIL calls the ‘the Incident Life Cycle’. If it sounds complicated it isn’t – and it’s probably something you’ll recognise from your own Incident handling work.

Incident Life Cycle

Incident logging. This is simply where we have reported to us (or detect through automated tools) that an Incident has occurred.

Classification. We classify the Incident, and offer initial support to the customer. At this stage we are looking to get a handle of the business impact of the Incident, assigning a priority and seriousness, and classify the type of Incident.

Investigation, analysis and diagnosis. This is the ‘meat’ of the process, where we try to understand how we can restore service to users. It’s important to note here that resolution of the Incident should not be the only thing that we are considering as outputs – workarounds are useful to users in some cases (for instance, where we see the resolution of the Incident as possibly staking some time).

Resolution. Taking our analysis we resolve the Incident and restore services to users. Note that this may have entailed raise a Change and/or Problem.

Incident Closure. We close the Incident, allocating Cause data and contract codes as appropriate.

What I want to talk about know is something referred to by ITIL as the ‘Workflow Position’ – this is something that confuses some people, but again is really quite straightforward.

Refer to the life cycle above, and consider this. We log and Incident, and at some time in the future we resolve an Incident. In the middle we ‘do stuff’. For example, you might

  • Assign the Incident to a specialist team for assessment
  • Put the Incident on hold, because the Customer has gone on a 3 month holiday
  • Escalate the Incident to a Team Leader

The Workflow Position is simply a signpost about where we are with an Incident. In the ‘Service Support’ book (published by TSO) the following examples are listed:

New, Accepted, Scheduled, Assigned to Specialist, Work in Progress, On Hold, Resolved, Closed

Please note that ITIL is not prescriptive, so these are listed as examples to communicate meaning and foster understanding. It doesn’t mean that is what you must have in order to be ‘ITIL Compliant’ – you are expected, as someone who works in ITSM, to fit this to your particular organisation (and that means extending these with status values that make sense to you).

Workflow Position is useful in Incident Management, and here’s why. It allows you to easily create status reports that tell you where you are with the unresolved Incidents you are currently handling. For example, you might run a status report that says:

Waiting for customer response 12

On hold 6

In progress 30

With external suppliers 10

Awaiting assessment 12

This is more useful than simply saying ’70 Incidents open’ as a status of our current status.

I’ll continue this thread about Incident Management in coming posts.


Monday, 08 Jan 2007

A Happy New Year to all our blog readers!

This blog may be the first in a series on the subject of Incident Management – a fundamental part of ITSM. I’m going to start with the basics, and then take it from there.

The first thing I need is a definition, of what an Incident is. ‘Best practice for Service Support’ (pub TSO) very kindly gives us a definition as follows:

any event which is not part of the standard operation of a service and which causes, or may cause, an interruption to, or reduction in, the quality of a service.

Incidents, therefore, cover a multitude of faults, errors and unexpected events users might experience. An Incident can also be taken as a simple user query like ‘how do I do a mail merge?’ or a password reset request.

Having defined Incidents, it’s worth stating the goal of Incident Management: to restore services as quickly as possible to users, thereby reducing the impact of Incidents upon the organisation. This is not to say that is all we do: there are quite a lot more tasks involved, but this is the ‘output’ or primary goal of the Incident Management process (remember that a general goal of IT Service Management is to stop Incident happening in the first place).

The role of the Helpdesk/Service Desk in Incident Management is key – even if specialist teams (such as network operations or database administrators) happen to be assigned to the Incident. This is because the Helpdesk/Service Desk should ‘own’ the total pool of active Incidents, taking action where required to ensure a timely resolution or workaround for the customer. This is why Serio allows you to have both an Assigned Team and Agent, and an Owning Team and Agent for an Incident – so that the actual ownership of this ‘pool’ of unresolved tickets can be split between members of the Helpdesk or Service Desk.

To illustrate this with an example, you might have a Service Desk agent called John who logs an Incident for a database error. John might have to assign this to the specialist Database team for resolution, but still maintain some ownership (distinct from assignment) for the Incident in question. What John actual does with the Incidents he owns will depend upon his companies’ own procedures, but he might:

  • Intervene and review the Incident at some point before a resolution time (SLA) breach
  • He might manage communication with the customer
  • He might be involved with the further escalation of the Incident

I’ll continue to address this subject in later posts.