Serio Blog

Friday, 16 Mar 2007

This is a follow up to my last post on the subject of KPIs in Problem Management – read that previous post for background. In what I write I’m assuming that the KPIs are going to be used in a report whose audience will be the Problem Management team (therefore, quite detailed).

I’ve concluded that we are looking for signs that the Problem Management process is working. Here are some suggestions for you to try.

1. Number of Problems raised.

This is a tricky one – I’d never show this figure without comment. A figure which is low may indicate a situation where Problems are not being adequately identified, as I’ve discussed in this post. You can compare this to the number of Incidents raised over the period, to see if the rising/falling trends are the same.

2. Number of Problems Resolved.

There are a few ways I’d present this. I’d show the number of Problem satisfactorily resolved, and the total number of Incidents linked to these Problems. This is useful because it tells us how many Incidents we’ve finally got to the root cause of.

Then I’d produce a summary listing, so that the detail can be seen. I’d show the Problem Reference number, Priority, a one-line description, and the number of linked Incidents. Doing this allows readers to see how the ‘overall’ figure breaks down.

3. Days open for unresolved Problems.

One of the things that you wish to see in your Problem Management process is that tickets that are logged do (eventually) move to resolution, and so you want to see how long Problems have been on file for. An easy way to express this graphically is to have a ‘days open’ figure along the vertical axis, and then to list each open Problem along the horizontal axis, with its reference number, showing how long it has been open.

I have met customers who have used Service Level Agreements on Problem Management. Generally I don’t approve, partly because SLAs should be for things that a customer sees (like an Incident), but mostly because I don’t think it improves efficiency or promotes best practice. Sometimes Problems can (by their unexplained nature) can take a long time to come to a satisfactory resolution, and the outcome we want is a quality (complete, accurate) resolution. What is important though is that your own procedure make sure that Problems don’t just stall, but continue to be moved forward.

4. Activity since last reporting period, per Problem

I’ve written above how we need to make sure that Problems are being looked at, priorities, managed and assessed by the team – rather than simply gathering dust in your service management tool.

Whilst more of a status report, list each Problem and give a brief (no more than one or two line) summary of what you’ve done since the last reporting period.

So for instance, if you are reporting every fortnight you might say ‘Identified application X as possible cause, currently discussing with vendor support’ for a given Problem ticket. Serio uses would simply create an Action called ‘Progress update’ for this purpose, to be completed by those assigned Problems within the service management teams.

Needless to say, you are looking for signs of life across all or most of your Problem records because without that we won’t move to resolution.

[Edit: I issued an amendment (actually, a little more detail) to this post. Click here to see the post]

[Edit by Blog admin: See also the related Incident Management KPI post

Wednesday, 14 Mar 2007

Customer Nick asks for ideas on data he can use as a KPI for Problem Management.

That’s a really good question, and before answering it I’ll refer to the definition of what a Problem is. However, the definition of a Problem is slightly different to what the goals of Problem Management are, namely (taken from the ITIL Service Support book):

“The goal of Problem Management is to minimise the adverse impact of Incidents and Problems on the business that are caused by errors within the IT Infrastructure…”

In other words, if you ask ‘why do we bother with Problem Management’ the answer is ‘to try to stop Incidents from happening (thereby avoiding their cost and inconvenience)’. Therefore, any KPI should be targeted toward measuring the effectiveness of the Problem Management process – just as Incident Management KPIs have an element looking at timeliness of resolution.

But hold on a minute, it looks like we’ve just described something very difficult. Our ideal KPI would show how many Incidents had not occurred due to our Problem Management process, but if you can get a report from your Helpdesk or Service Desk software telling you how many Incidents have not been logged then all I can say is wow.

I have seen instances where Problem Managers have shown a declining number of Incidents say over a 3 month period, and concluded this is a direct result of Problem Management. All I can say is maybe. Incidents can decline for all sorts of reason such as

  • Investment in the infrastructure, or technological changes unconnected with Problem Management
  • Less business activity (like after a Christmas rush period)
  • Less business or technological change
  • Fewer new employees joining

…and so on. The converse of the examples above can lead to an increase in Incidents, even if your Problem process is bang on the money. Personally, disentangling all these reasons is so hard and unscientific I don’t think it’s a good avenue to follow for anyone.

Instead, I guess we’ve got to use some common sense. Common sense tells us that if we have an effective process in place then Incidents and downtime costs will reduce. So to me it makes sense to focus on ‘is the Problem Management process working?’.

Having described the challenge at length, Friday’s post will describe some actual Problem Management KPIs you can use.

Monday, 12 Mar 2007

This post is about an often overlooked feature of the Serio CMDB. Risk Assessment is a formal process of identifying and measuring risks posed to organisational Items - for example the risk of a hard disk crash on a key server, the risk of theft of laptops, the risk of hackers stealing customer data, the risk of someone spilling coffee on a computer and so on.

Risk Assessment helps you to decide on Changes that you need to make to your systems to reduce risks (of course, Change Management naturally complements the Risk Assessment process by providing controlled a way to implement such Changes). Naturally Risk Assessment is a proactive activity – trying to stop the costs associated with Incidents before they occur.

A formal Risk Assessment exercise is a key step in gaining BS7799/ ISO17799 certification. BS7799 requires that you should develop a Risk Assessment methodology which takes into account the value of Items to your organisation and the seriousness of threats.

So far so good. Serio adds value to this by providing a framework for storing and reporting on the results of your Risk Assessment exercise which is geared towards the requirements of BS7799.

The Risk Assessment Process

The Risk Assessment process in Serio consists of gathering information about Items and the risks they are exposed to. This information is stored in the Serio Configuration Database for reporting and analysis (if you’ve ever wondered what the Threats and Vulnerability icons were for, this is it).

The following description introduces the main concepts and stages of the Risk Assessment process. Once you have familiarised yourself with this introduction, please see the Risk Assessment Roadmap in the HowTo guide (distributed with Serio products) for a step-by-step guide to implementing the Risk Assessment process using Serio.

1. Identify organisational Items that must be protected from risk. Such Items may include:

  • Key servers or network equipment
  • Key applications, such as web or email servers
  • Information (Virtual) Items, including databases or files containing sensitive information, such as customer records, product specifications, sales data, eMails, etc.
  • Services, such as heating, lighting, power and telecoms.

2. Use your knowledge and experience of your organisation to assign an Organisational Value to these Items. This is a matter of answering the question, "On a scale of, say, 1 to 5, how important is this Item to our organisation, in terms a loss of availability, confidentiality, or integrity?" A score of 5 indicates the highest Organisational Value.

3. List the Vulnerabilities of these Items. Any aspect of an Item's location, function, use, or characteristics which puts the Item at risk should be counted as a Vulnerability. Assign a Vulnerability Level, on a scale of 1 to 5, to each Vulnerability. (A score of 5 indicates the highest Vulnerability Level.) For example:

  • Exposure of web server to the Internet (4)
  • Mis-configuration of the firewall (3)
  • Positioned underneath the air conditioning water storage tank (3)

4. List the Threats posed to the Items. Theft, breakages, hacking, website defacement, viruses, worms, media failure, and security violations are all examples of Threats that Items may face.

Note: Relationship between Threats and Vulnerabilities There is a link between Vulnerabilities and Threats.

For example, the fact that a laptop is portable (Vulnerability) exposes it to the possibility of theft and breakage (Threats). If a computer's operating system is misconfigured (Vulnerability), it may be threatened by viruses, worms, or unauthorised access (Threats).

Identification of Threats and Vulnerabilities is an iterative process: discovering Threats may lead you to identify Vulnerabilities, and these in turn may reveal further Threats, etc.

5. For each Item, assign a Threat Level, on a scale of 1 to 5, to each of the Threats that you have identified. (A score of 5 indicates the highest level of Threat.)

Two Items may face a Threat at an equal Threat Level. For example, two servers may be equally threatened by hard disk failure. However, where one of the Items is more valuable to your organisation than the other, the Threat to the more valuable Item is clearly more important. To reflect this, Serio calculates the Risk Level associated with a Threat, according to the following formula: 

Risk Level of Threat = Organisational Value of Item x Threat Level

For example,

  • Web Site (Organisational Value = 4)
  • Defacement Threat (Threat Level = 4)
  • Risk Level = 4 x 4 = 16.

6. As you gather information about Items, Threats, and Vulnerabilities, you can store it in Serio. You can then use this data you to produce reports or to search for Items which are exposed to unacceptable levels of Risk or Vulnerability. This analysis should help you identify Changes that you need to make to your systems to reduce the risks arising from particular Threats or Vulnerabilities. 

Friday, 09 Mar 2007

I’m going back to the point raised by Peter, as mentioned in this post from last week (specifically Peter has to improve service with a reducing budget, demoralised staff and reducing headcount).

I’ll recap on some of the things I’ve covered:

This blog post will be about shared vision and understanding in IT Service Management, and how important it is that, as a manager, Peter works to achieve this (excuse the term ‘shared vision..’ even though it sounds like something David Brent might bang on about).

Let me start by saying what I would not do. I would not want to be a Moses figure coming down from the managerial mountain with ‘the process’ or with ‘an ITIL vision’ handing out papers, processes and directions. This can engender a feeling of antipathy before you even get started the way that any change imposed in a work setting can do (my experience of working life gathered in the last 25 years is that generally people are resistant to change – even in IT – where technological change is viewed very differently to organisational change). Remember that your staff may think they currently do a great job, even if customers think they do not.

What I would do, and have done in the past, is to persuade that we need to do things better and differently, and to encourage people at all levels to suggest how.

As always though, there is a right way and a wrong way. Don’t ask your staff to write down suggestions and send them to you – the chances are you’ll get almost no response. Many people are intimidated by blank sheets of paper (even me, from time to time).

A better approach is to organise (i.e., chair) workshops where people can suggest improvements. However, organise these after you’ve applied some thought yourself about the problem(s) as you see it, so if the workshop is slow to get started you can prod it into life. Make careful notes during the meeting, and report back promptly afterwards.

At all costs avoid the workshop turning into a festival of moans. The first of these I ever organised did exactly this, with the 1st-line team immediately saying “the network team cherry-pick Incidents leaving behind stuff they don’t like the look of”. As it happened this was true, but with network team members in the room the tone became confrontational.

At the time I handled this badly. What I’d do now would be to re-state the negative in a more neutral way, as in ‘You mean some Incidents you assign to the network team are not being handled as quickly as customers would like’ and try to move onto a more consensual tone.

During these workshops, some people will contribute a lot, others a little or nothing. If you are reorganising and creating new roles (such as an Incident Manager or Service Level Manager) try to reward the most interested and positive with new roles, rather then simply choosing people based on seniority (though I know that can be controversial).

I’ll continue this theme in future posts, looking at some of the specific ITIL disciplines Peter might consider. 

Thursday, 08 Mar 2007

In this blog article I want to create an ‘overview’ topic about email in Serio. The reason is a lot of users seem unaware of some of the things you can do with email.

I will start with when and where emails are produced. All of the emails are optional – so they can be switched off if you’d rather not send any.

NB: I’m primarily going to talk about Incidents for brevity’s sake, but what follows can also apply to Problems and Changes as well.

  • After logging an Incident, a confirmation can be automatically sent as confirmation to the customer.
  • When resolving an Incident you can send a ‘we’ve completed this’ email to the customer. The email can contain what you did to resolve the Incident, and invite a satisfaction rating.
  • You can send an email to customers whenever you take an Action. You can do this by just having a single Action called ‘Send email’ whenever you want to send one, or you can make each Action send an email to the customer. This is like a courtesy that says ‘we’ve updated your Incident, here is what we did’.
  • You can have Serio notify Agents (people delivering service to customers) when you assign an Incident to them.
  • You can have Serio ‘copy’ lots of Agents when an Incident fitting a particular set of criteria is logged. We call these Broadcast Alerts.
  • If you are a Serio Command Center user, you can have Serio email you when different network and server conditions are detected.
  • If a customers uses SerioWeb, to log an Incident or place a note on an existing Incident, you can have Serio send you an email to let you know about this.
  • At a given escalation point, the Escalation Engine can send an email to the customer to let them know you've escalated the ticket.
  • To let a supplier know that you’ve assigned an Incident to them, as part of a supplier management process.
  • We can also send emails about emails – for example, a customer replies to an email you sent earlier, we can send you a notification about that.

That’s a lot of emails. The important ones (generally those that go outside of the support desk) you have total control of the content through the use of eDocs (electronic documents). For example, you can choose HTML or text based formatting, and can include lots of data from the Incident (such as SLA targets, who is handling the Incident, which team it is assigned to, the Incident description and so on).

There is also another whole raft of notifications and warnings that are used in Change Management (a topic for later).

There’s some pretty nice stuff which might not be obvious.

Variable ‘sender’ address. If you are a ‘virtual’ helpdesk or service desk (you provide services to different companies as if you were part of that company), Serio can vary the senders email address of outgoing emails so that one email is sent from ‘’ and the next from ‘’.

Language-based emails. (This is a great feature, I helped design it). If you offer services in more than one language then Serio can switch the language of emails. Suppose you offer support in English, French and Dutch. When an English speaker logs a ticket, they will get an English language confirmation. If a Belgian French speaker logs a ticket, they’ll get a confirmation in French. What is cool about it is that if a Belgian Dutch speak logs a ticket, they’ll get a confirmation in Dutch (in other words, it’s not determined by Country).

Monday, 05 Mar 2007

I’ve blogged previously about improving your success in an ITSM project here and here – mentioning setting yourself clear objectives, and resisting the urge to complicate things.

In this post I’m going to look at another ingredient – roles and responsibilities.

A colleague here at Serio tells the apocryphal story of a TT racer in the Isle of Man whose engine fell out at Windy Corner. When they got the bike back to their workshop, the two mechanics looked at each other and said ‘I thought it was your job to tighten the engine studs!’.

The moral of the story is, I guess, if you want something to be done don’t leave it to ‘the team’.

What you should do is define some clear roles and responsibilities. For instance, you might define the Incident Manager role, give it to a person and start to define the responsibilities thus:

Responsibility for documenting the Incident Management process (keep it simple though), ownership of and responsibility for the quality of Incident data, management of the Incident management team, production of monthly management reports listing issues and proposals for improvement…

… and so on. If you are just starting out, you’ll want to consider mapping both roles and teams out carefully, and try to define

  • A Service Desk Team
  • An Incident Manager
  • A 2nd-line Support Team (with a Team Leader)

and you can then expand this with other roles such as Problem Manager as required and as demanded by your own situation.

Remember it’s not enough to create a job title – staff being reorganised for IT Service Management need as much help as possible for them to understand what is required. Therefore each role should very clearly have something which specifies what is required and what the responsibilities are – in detail. Take the objectives and goals you started with and use these to shape your roles and responsibilities to deliver what you want to achieve.

Of course creating roles spreads the workload and gets others involved, but it does require a degree of shared understanding and commitment – something I’ll address in later posts.

Friday, 02 Mar 2007

This post will be all about Escalation in Serio, the role that the Escalation Engine plays, and a personal perspective on ‘good practice’.

First of all, let’s define Escalation as it means different things to different people. Back in January I posted about Escalation in Incident Management, and that post defines both Function and Hierarchical Escalation. You’ll be pleased to hear that the Escalation Engine can handle both.

Serio Service Level Agreements have a ‘grid’ structure, and for each Response and Resolve target you define you can have up to 12 Escalation Timepoints. That means, say you have 4 different Response and Resolve Targets, you can have 4 * 12 = 48 Escalation Timepoints all occurring at slightly different times. That’s a lot of Timepoints! Of course, each Incident will have no more than 12, but the Incident next in the queue could have a different 12 – you get the idea.

Personally, I sometimes wish that the tool had less than 12, as managers are sometimes tempted to use all 12 – it seems to be a case of ‘we’ve paid for 12 so let’s use 12’ but in ITSM, as in life, less is often more. I’ll explain why.

Every Escalation Timepoint should have a consistent meaning in your Incident Management process, and for each Timepoint you need a managerial or service-based response. Without this, how will your staff know how to respond to the notifications and alerts they receive?

Let me give you an example of good use.

  • Escalation Timepoint 1 = SLA Breach in 2 hours
  • Escalation Timepoint 2 = SLA Breach in 1 hour
  • Escalation Timepoint 3 = SLA Breach
  • Escalation Timepoint 4 = SLA Breach by 25%
  • Escalation Timepoint 5 = SLA Breach by 50%

I mentioned a management response. Ideally you’ll have, in your operations manual (we have a template, request it from support), something along these lines:

At Escalation 1, the assigned Agent should notify their team leader (or Service Level Manager or whoever else you feel is appropriate) of the Escalation – and ask for guidance.

Escalation 2 is for information – breach imminent.

At Escalation point 3, contact the customer at let them know you are aware of the SLA breach

… and so on.

The real trick is to avoid a situation where lots of alerts and messages are produced that everyone ignores. If I’ve just described your Helpdesk or Service Desk, revisit both how you’ve set-up the system and the information and training you’ve given to your staff.

The Serio Escalation Engine helps out in the following ways:

  • It can send you alerts as the Escalation Timepoints occur
  • It can warn you about response breaches
  • It can place Actions on Incidents, Problems and Changes automatically to record the Escalation, giving you total control (through eDocs) over the Action comment
  • It can automatically send emails to the customer
  • It can automatically re-assign Incidents if that’s what you want it to do (though I’ll leave it to you to decide if that is the right thing to do)

Now which of those is Functional, and which Hierarchical?

Have a great weekend in the winter sun!

Wednesday, 28 Feb 2007

This is a follow-up post to improving your chances of success with an ITSM project.

One thing I want to clarify is about the topic I’m expanding upon: improving the chances of success – so that doesn’t mean I’m going to try to address how you should go about an ITSM project (which will depend on your current position, skills, budget, and organisation).

I’m interested in improving the chances of success, like you might say ‘improving the chances of cycling home safely in February’ might be ‘have good lights, don’t drink alcohol beforehand, wear reflective clothing, avoid busy and narrow roads, or roads with vehicles travelling at high speeds’. None of these things guarantee the outcome I want cycling home, but they do improve my margin of success.

On Monday I blogged about objectives, and how important they are. It occurs to me that another use for a good set of objectives is at the end of the project, as a way to judge if you’ve achieved the things you needed to at some point in the future.

Today I’m going to blog about what I consider to be akin to a mortal sin – complexity.

Complexity seems to be a very common personal trait amongst managers. However, without question the best and brightest people I’ve worked with have been the ones with the insight to make things simple.

An example is when someone says ‘I have come up with an Incident Management process’ (I’ve blogged about Incident Management before).

Often, with a flourish, a flowchart is often produced. I’ve nothing against flowcharts (Serio has some pretty nice ones for example) but Helpdesk and Service Desk managers do seem to be intoxicated at times by them. These flowcharts twist and turn, doubling back on themselves, looping and branching, applying different (and obscure) status values before coming to an end. In my experience these are almost always unnecessary and destined to be ignored by Incident Management staff who are getting-on with the business of resolving Incidents and dealing with customers.

Instead focus on customers, restoring service, how we organise teams, how do I develop a sense of quality and ownership, how do we assign between people, how can I develop my team leaders, who owns the tickets, what responsibilities do people have. Be outward looking.

More than anything, keep things simple. If you have a team at the smaller end of the scale, don’t regard this as a problem – instead view it as an advantage by making your procedure and processes simpler. Adopt an attitude that your staff are trained correctly and know how to do their jobs. If problems become evident early on respond to them, but don’t try to anticipate problems in advance.

Monday, 26 Feb 2007

An emailer I’ll refer to as Peter asks what he can do to ‘improve the chances of success’ with his ITSM project. He describes his help desk as being reactive ('being honest, we just fix things when they break') and describes his users as unimpressed with the quality of IT Service. He adds that ‘I have no additional budget at all, and face losing one member of staff through cost cuts over the next 6 months. I have a simple helpdesk tool, and cannot buy anything else. But I need to start doing more and providing a more professional support service or I won’t be here next year’. Peter has 8 staff, and with staff costs his IT spend last year was £280,000 – this year it will be less to an unspecified amount.

First of all Peter, there are a lot of resources like this blog that are free that you can use – a spot of googling will locate them. I’d start by saying have a look at the IT Service Management category here, and at some of the white papers you’ll find on our home page.

Peter specifically asks for things that will improve his chances of success, and appears to be familiar with ITIL – indeed he mentions it twice in his email.

Firstly, I’d say you should be clear in what you want to achieve, and be wary of seeing ITIL (or any other service management framework) as an end in it’s own right – you should not be aiming to be ‘ITIL compliant’ because that of itself my not deliver what your company needs. Instead, look at things such as ITIL as a way to help you deliver improvements to service.

My editorial guidelines for blog articles require me to be practical, and so I’ll try to expand the paragraph above and be both practical and specific. When I say ‘be clear what you want to achieve’ I mean set real, tangible, specific objectives for yourself, and avoid generalities like ‘improve Incident handling’. Specifics might include things like:

  • Reduce the number of Incidents measured month-on-month by 10%
  • Improve our first-time fix rate
  • Reduce the downtime experienced on our key systems
  • Reduce the amount of time taken to resolve Incidents
  • Find out why our users/customers feel the way they do about us

..and so on. The objectives you set yourself will probably be linked to the weaknesses of your current IT support operation as you see it, but be specific and be ‘outward looking’. One of the traits which successful managers involved in ITSM seem to have is that they are outward looking – towards the organisation, business, and customers.

Properly set out, your objectives will help to keep you focussed and on-track.

I’ll continue this interesting topic in later posts.

Friday, 23 Feb 2007

Whilst the above title might sound like an advert for expensive chocolate, I’d say that Description Templates really are good things – and an often overlooked feature.

First of all, this is what Description Templates do: they are like a ‘speed click’ for the description of an Incident, Problem or Change (and more, as we’ll see below) – to save you having to type in manually. They are also useful as prompt or mini-scripts for those logging Incident from customers.

For instance, you might want to manually type this Incident description:

The customer has forgotten their password on the domain server, and ended-up getting themselves locked out. I validated the caller’s ID and re-set their password. They logged-in whilst still on the phone

or with a few clicks you could use this ‘standard’ text whilst logging the Incident.

Another use is as a prompt to help you capture the right information at the time of logging a ticket, as in this example:

Application Version: XXXX

Error Message: XXXX

Operating System: XXXX

Screen reference or ID (If known):

and so on – so that if you escalate the Incident to second-line support, they have everything they need in front of them.

Using an Description Template is easy. When logging the ticket, look at the field where you enter the Incident Description and you’ll see the standard Serio look-up button marked with an ellipsis (…). You simply click that and choose the 'canned' description you want.

If that lookup window is empty when you click it, then your Serio Administrator needs to set one up for you – so ask them nicely. However, don’t just say ‘Can we have some of them Description Template things please?’ write what you want (the whole text) in an email, and send it to them.

Description Templates can also be used in Actions. In just the same way that they can speed-up Incident logging, you can also speed-up taking Actions as well. It’s done this way: your Serio Administrator can link a Description Template to an Action, so that when you take the Action the comments field is pre-populated for you.

Again, if you think this is a good idea send a full and complete request to your Administrator – it only takes a few minutes to set-up.