Strategies for Incident Backlogs

This post has been suggested by someone who I’ve known for a long time as a specialist engineer in IT Service Management (let’s call her Linda), who has recently taken a new role as a Service Delivery Manager in a new company (in other words, she has a new role and a new organisation to cope with). The first thing she noticed on day 1 was that although her company (I’ll call them WidgetCo) has just 600 users and 8 IT staff, there were over 600 outstanding Incidents currently open, some of which had been open since April.

WidgetCo don’t use ITIL or have any concept of IT service management, though no doubt that will change. My friend’s comments to me where: ‘what kind of things can I do to get the outstanding Incident count down to a more manageable level?’

Here are some of the things I’d do, in no particular order.

  1. Warn senior managers, directors and project sponsors about what you’ve found. Give them a quick summary of the problem, indicate what you are going to do and how long, and promise to report back at the end of this period. If you are running reports that show Service Level performance, warn them this and possibly other statistical measures may take a beating while you clear the backlog.
  2. Consider letting your customers or users know what you are doing, perticularly if in the short term there will be an adverse effect on quality of service.
  3. Give you new staff some quality guidelines (Linda has told me not every Incident is logged). This should include
  • Log every Incident
  • If you resolve an Incident, enter it on the system as resolved within 3 hours.
  • Initiate some basic quality measures, such as specifying a decent Incident description and resolution comment
  1. Find out if some of the Incidents you have are actually Requests for Change. You need to make a distinction between Incidents (where some fault to a user service has occurred) and situations where customers are requesting amendments to services (important though those requests might be).
  2. Assess the older Incidents (maybe from April through to August) one by one – you may need to have someone help you with this. Most ITSM tools such as Serio allow custom ‘status’ values to be attached to Incidents (in Serio, Agent Status A and B are examples of this). Create some status values to attach to your Incidents so that you start to get a grip on what the data means. It will also help you to remember which have been reviewed. The following status values my be useful:
  • Outstanding (means reviewed and is still required)
  • Change raised (means this Incident was actually a Change)
  • No longer required (in my experience, you may find some of the Incidents have either been fixed and the system not updated, or something has changed elsewhere and action is no longer required)
  • On hold (means the status is unclear until further review, naturally you wish to keep this down to a small number)
  1. Check your statistics to see who is actually resolving Incidents, in order to see if some staff members are under-utilised. Use this data with caution – those with lower resolutions shown might be dealing with more complex Incidents. Alternatively, they might be the ones who 3 hours per day on ebay.
  2. Stress to your staff the importance of resolving Incidents, and consider posting a ‘daily outstanding’ on the notice board or some other prominent position.
  3. Consider forming a special team to address the ‘Outstanding’ Incidents you’ve identified above. In Linda’s case, this would be unlikely to be a team of more than 2. I stress the word ‘consider’ here – you may choose to leave Incidents with engineers that are currently handling them.
  4. Incidents will be assigned to Teams and Agents. Check the assignment levels – a bar chart may be useful for this. If you see bottlenecks (for example, 300 Incidents assigned to one individual) take action by re-distributing as required.

Some of the suggestions here many lead to other problems. I’ll write about these in a follow-on post.