Serio Blog

Tuesday, 20 Feb 2007

This is a follow-up to my earlier Interviewing post – I said at the end of that I would suggest some topics that an interviewer could use. If you are going to use any of these, please keep in mind all my do’s and don’t from the earlier post.

If you imagine that you are interviewing for an Incident Manager to take over a team of (say) 6, in an organisation where IT Service Management is developing and not yet totally mature - I'm hoping these questions/topics will help.

Also note that how you describe the job is important – for instance, you might have said ‘experience of IT Service Management techniques, and their practical application’ for an organisation of your size. What you must do is signpost the fact that ITSM is part of what you are looking for - it's not nice to suprise people at interview.

Topic: As I’m your supervisor, what might you include on a management report to me?

I might start by asking this, and discuss some of the Key Performance Indicators (KPIs) that he might include (if you are asked ‘what do you want?’ reply you are looking for suggestions). All I would be looking for here are some ideas that show the candidate has thought about this, and prepared (or read) such a report previously. If the only suggestion offered was ‘SLA resolutions on time’ I might be a little disappointed – there are many other measures that tell you a lot more.

Topic: What would you expect to have responsibility for as Incident Manager?

This is potentially a little controversial, so explain you are simply looking for sensible suggestions and have nothing fixed in your mind. What you are looking for here is for the candidate to show they have a good grasp of such a role, and to suggest things such as the Incident Management process, production of reports, day-to-day Incident team supervision, taking a lead in Major Incident handling and so on.

Topic: What strategies might you adopt for specialist Incident teams?

There are lots of ways to phrase this, such as ‘we have teams organised into technical specialities, and some of them are hopeless’ or ‘Our second line support teams has a record currently of poor customer service’. Regardless of the way you phrase this, what you are getting at is how the Incident Manager will work to ensure an acceptable level of quality and service that he or she may not control directly (of course, your candidate might have suggested a supervisory role of such team in an earlier response).

Topic: You find out that Incidents are marked as resolved when they are not. What do you do?

Again, you should be looking for practical suggestions (see my earlier post on Incident Resolution and Quality) that show a methodical and pragmatic candidate.

There are many, many more you can use. Note that all of the above are open-ended with no absolutely right or wrong answer – they simply allow candidates to express themselves, and to show a little bit of creativity and interest in the subject.

One of the things that I’ve found somewhat off putting in the past is direct quotation from the written ITIL sources as a holy text –it’s usually the less confident candidate who does this. What I have found works best is to find someone with ideas and an interest in what they do.

Monday, 19 Feb 2007

I wrote previously about Service Level Management reports. I’m going to pick up this topic again for this post, looking at some of the more detailed or complex reports on this subject.

Once more, these reports will be located within SerioReports – see the earlier post for my comments on this. As with the previous reports, the reports below are grouped under ‘SLA Analysis’.

I will start with SLA10. This report focuses on the timeliness of resolution of Incidents, on a calendar month by calendar month basis. It is a histogram-style report, and shows the percentage of resolutions on time (by Priority) for each month. It allows you to add your target (for instance, 90%) to the report, so that the actual outcomes can be compared to the target. As the report shows month-on-month percentages it is useful in showing trends (improving or worsening situation over time).

People are sometimes puzzled as to why more than one target line can be added to the report. This is because some customers define variable targets for SLA resolutions, as in this example:

Priority Critical – Target 99% on time

Priority High – Target 95% on time

Priority Medium – Target 90 % on time

and so on.

Other Serio customers simply have a target that is the same across all Priorities. Either way it’s fine – the report works for both groups.

As a final flourish, you can add averages across each month across all Priorities.

Another more complex report is SLA7. This report also reports on timeliness of resolution, but includes information, banded by Priority, about each Escalation level reached. This is useful because it is not good practice to focus on only on what was on time and what was not. Service level managers should have further interest as in this example:

For the Incidents that were not on time, just how late were they?

For instance, it is not as bad to have Incidents being resolved 30 minutes after their target as it is to have them resolved 30 days after their target. One way to achieve this, assuming you’ve set-up your Escalation points sensibly, is to use SLA7. Alternatively, you can explore this ‘lateness’ concept by running report SLA12.

Friday, 16 Feb 2007

In an earlier post I touched on the subject of interviewing, in response to a commentator's question. This turned out to be a really popular article, with a significant number of unique visitors and email correspondence generated. Thank you all for your comments.

I'm going to revisit this subject today. Our internet logs tells us that quite a lot of people are searching for information such as 'interview questions' or 'how to conduct a service desk interview' and I want to write an article that might help. However before that I want to talk about spambots and recruitment firms.

One thing the earlier post did was generate, after a period of about a week, a very large number of calls from recruitment firms – all asking for either 'Pete' or 'George', all adamant we were looking for a new Service Desk Manager, and all very, very persistent. At first we didn't understand this, but then someone explained it to us. There are a significant number of what I will call spambots that are looking for vacancies on websites, and clearly they've been crawling ours (bots are programs that read all of the pages in a website). These bots are clearly trying to read the text, and then sell the 'opportunity' to recruitment firms. However in this case the bots seem to have misread the post.

OK for the benefit of bots: please do not call us, this post does not mean we have a vacancy smiley

Now on to the subject of interviewing. It's always surprised me that most people who are interviewing have had no training, and very little guidance. Presumably you are meant to learn the technique from the times you've served as the interviewee. What I'd say is that if it's your first interview, don't be afraid to role-play with a colleague (it's best done with a peer). Get your colleague to sit with you playing the role of a candidate, and afterwards discuss with them how the interview went. This will help you get into the swing of things, and act as a nice rehearsal.

For the rest of this post, I'll adopt a do/don't format.

Do: Try to bring the interviewee into the interview as early as possible. In other words, don't make them sit there and listen to the history of the company starting from when your founder was born. State a little bit about the job and try to get them talking. Assume they've done some research.

Do: Use the CV as a talking point. Most people will have projects listed on their CV. Try to get them to talk about their projects, focusing on what they did and their role in the project. Ask them about what worked, and what did not. Ask them what they learned.

Do: Give people a little time to compose themselves. Once shown to the interview room, I ask if they'd like tea or coffee. Regardless of the answer, I leave for just a few minutes (note: not 5 or 10 minutes, just 2).

Do: Keep your questions open-ended and allow the interviewee to come in as often and as frequently as possible.

Don't: Start with areas of discussion or topics that are likely to be controversial. Leave these towards the end when hopefully you've built up a rapport. Personally I avoid arguments – if someone states a fact which I am certain is incorrect, I just say 'are you sure about that?'.

Don't: Bombard them with fact-based questions like they are sitting an exam. Just drop these into the interview. Don't turn the interview into an ITIL Q & A.

Don't: Paint an overly rosy picture. True story: a colleague of mine here at Serio (years ago) got a job as an IT manager. He had the foresight to ask during the interview 'why is the current IT manager leaving' and was told 'he's started his own business' which seems fair enough. After starting my colleague found out the truth: the company was 2 years old and had burned the 3 previous IT managers. The first suffered a nervous breakdown in the office after 9 months (crying, shaking), the second was fired after 4 months, and the third had just got up from his desk one day and walked out never to return. When he found out, my colleague was, of course, deeply unimpressed.

Don't: Assume the qualifications and references are all genuine. I've been surprised by the extent to which false information is placed on CVs – check things out carefully.

Do: Be careful about 'Why?' and 'Why did you do that?' as it will unsettle some people – making them feel like 'they've put their foot in it'. If you can, preface such questions with 'that's interesting' to make people feel more at ease.

Do: Finish with 'Are there things you'd like to talk about we have not mentioned?' and/or 'Do you have any questions for me?'.

Don't: Be secretive. Someone has taken their time to see you, so tell them about the recruitment process. Don't assume the agency (if they have come through an agency) has told them anything.

Do: Ask relevant questions for the job (naturally), following the advice given above (although question is the wrong word, it's more like 'topics for discussion').

In future posts next week I'll suggest some questions topics for you to use for a few typical Helpdesk or Service Desk roles. 

Wednesday, 14 Feb 2007

This post will be about reports available in SerioReports. My last topic focused on Incident KPIs, this post will focus on reports that might be of interest to a service level manager.

At this point, I’ll try to define what we aim to achieve with service level management:

Maintain and improve the quality of IT Service Delivery by agreeing what our service levels should be, and then constantly monitoring and reporting on the service levels we are achieving. Where appropriate make recommendations for improvements in service delivery.

There is more to this than simply response and resolve times, important as they are, but I expect we will pick up service level management in greater detail in future posts.

Locating the Reports

First off, you will need SerioReports – this is where most of the packages reports are. SerioReports is a licensed product (so you need a license!) that you have to install on a computer somewhere. If you can’t find it, ask your Serio Administrator to install it for you.

A large number of the service level management (or SLA) reports are grouped under SLA Analysis, where you’ll find about 25. What I will do now is to pick some of the more interesting ones and talk about them.

Firstly I’ll start with overview reports.

SLA5 – This is a very simple report. Like most of these reports, it takes a start date, end date and an SLA, and then shows you (by priority) your Incident resolution within agreed SLA times. This report is sometimes printed to PDF and circulated to ‘casually interested’ parties, or pinned to notice boards.

SLA6 – This is another simple report. It reports on callbacks or responsiveness, and it is a fairly common industry measure of performance. It is the amount of time taken for the Service Desk or Helpdesk to respond to an Incident being reported – it is not an automated response. In Serio when logging a ticket, there is a checkbox that says ‘Customer needs a callback’ – this report uses Incidents you’ve flagged in this way.

Moving onto more detailed reports there is SLA4. This report uses a matrix to report on Incident resolution performance by Company. It would be used by service level managers to after checking overall performance to examine service levels obtained for each major Company or department that they support, to identify areas of poor performance to groups.

Monday, 12 Feb 2007

This post is to tie together all of my previous posts about Incident Management that have appeared over the past month or so.

One of the things I really like about the idea of a blog is its somewhat informal nature and the expectation it’s content may be a little eclectic at times. One of it’s disadvantages is that it can be difficult to follow a ‘thread’ of articles over time - hence wrap-up posts like this are useful in some cases.

I started off in January with this topic: Introducing Incident Management, which offered a definition and talked about the concepts of Ownership and Assignment.

Next I posted about The Incident Life Cycle – which pretty much did what it said in the title. It defined and explained the major steps in the Life Cycle, and talked about the ‘workflow position’ with a few examples of how to find and use that in the Serio tool.

I then went on to blog about Escalation in Incident Management. This post talked about Escalation – a term that means different things to different people, and offered some examples.

I followed this by talking about Success Factors in Incident Management and listed some things to help you develop a better Incident Management process – things like culture, and giving yourself some attainable objectives in the opening month.

Next up: Major Incidents. Really this is worth quite a few articles, but I included it here because it seems to be something that seems ‘mysterious’ to some people. I defined a Major Incident, and suggested a (by no means) comprehensive list of things that might be part of a Major Incident process – to give those who had asked about such a thing an idea of where to start.

This was followed by an article on Key Performance Indicators for Incident Management. The idea here was to help with the reporting and metrics side of things – again with those getting started and having to do this for the first time.

My colleague and blog Robin to my Batman :-) posted in this post and also here about some reports in Serio that could be used in Incident reporting. These articles were detailed and told you where to locate the reports – so as to remove any confusion at all. All you need to do is draw conclusions and make recommendations for improvement based on the data you see.

Finally, we all need to develop good habits – so I posted some.

A PS blog post was added after a commentator’s question about quality. Phew.

Friday, 09 Feb 2007

Emailer Steve responds to this Problem Management white paper and poses this question: what can you do if no Problems are actually raised by the Incident resolution teams? Steve specifically mentions a period of over 4 months with no Problems at all being raised. Steve’s teams are handling 2500 or so Incidents per week, has 30 or so staff on the Service Desk and a further 70 working in specialist teams like I’ve referred to in this post on escalation.

Firstly, for new readers you can find a definition of a Problem here. It’s one of the outputs from Incident Management into Problem Management.

My editorial brief for this blog is to provide practical suggestions and to avoid unnecessary jargon – so that’s what I’ll try to do. What follows is in no particular order, it’s just some ideas to check off.

  1. Shared vision and understanding. If Steve has a vision of how things should work, and the use of Problem Management is a part of that, that vision needs to be shared between the managers and the IT service delivery staff. This can be quite hard, as sometimes the focus of more technically minded staff is not on service management. One of the things I see time and again is where the management view is one thing, and IT staff another. Managers do sometimes have a tendency to assume that staff know what is expected, when the staff either don’t, take a contrary view, or simply don't care.

Having stated the problem, I’ll try to suggest some solutions. One approach is to have regular IT service meetings where issues like this can be raised, managers can be persuasive, and staff can air their views. In my experience a lot of organisations simply don’t have meetings like this. It’s beyond the scope of this post to write about how such meetings should be conducted, but avoid having groups that are too large (some staffers will feel intimidated from contributing), make sure you have a chairman or woman who knows what they are doing, have an agenda, and produce action points afterwards.

  1. Make sure your procedures are written down in a concise, usable form. This comes back to my point about managers sometimes misleading themselves about the understanding that others have. However, in doing this you want a concise document that is useful and is not like a legal document. At Serio we refer to this as an Operation Manual and have a template/example for customers to use.

  2. Ensure that your staff understand what a Problem is, as the actual meaning can be quite elusive to some people. The best way to illustrate this is to pick one or two Incidents, and explain in your meetings why these are Problems (be careful not to be seen to be critical in the early stages).

  3. You should have a Problem Manager, and everyone should know who that person is. This person owns all the Problems, and the Problem Management process. The Problem Manager should have a team. In all probability, this will not be a full time team, but will be drawn from the existing service teams so that staff have dual roles (we work on both Incidents and Problems). Pay particular attention to the composition of the Problem Management Team – choose from a good cross section of disciplines and Incident Management groups. These people can acts as advocates of Problem Management for you and as ‘spotters’ in the Incident teams.

  4. Make sure it’s really, really easy for staff to flag Problems (and I mean REALLY easy). For instance, you could define a Cause Category of ‘Unknown’ which is applied at resolution time and this could be taken by the Problem Manager as someone saying ‘this is a potential Problem record’. This would be my advice for Serio users. Make sure that whatever Cause code you use, it's meaning ('hey this is a potential Problem!') is understood widely.

  5. Make sure your ITSM tool is up to scratch. For instance, raising a Problem ticket from an Incident should be easy. Linking multiple Incidents to a Problem should also be easy.

  6. Enhance the role of the Service Desk in Problem Management by getting them involved in reviewing Incidents. Make a specific responsibility one of scanning for Problems.

  7. Ensure that the Incident Manager and Problem Manager have a good working relationship on all levels. If they sit next to or within earshot of one another you might find that this helps the Problem Manager be aware of issues coming through in Incident Management process that he has an interest in.

  8. Whatever your procedures are, make sure the Problem Management process does not slow in any way the ability of Agent to close Incident tickets – they might want to do this quickly for all sorts of reasons (see yesterday’s post). They should be able to flag Incidents quickly for consideration by the Problem team, and it should not slow their ability to close the ticket.

Wednesday, 07 Feb 2007

Emailer and sometime commentator Jim asks about quality and his service teams. Specifically, he has second and third line support teams that both have what Jim calls an alarming tendency to mark Incidents as ‘resolved’ when they are not resolved at all. Jim asks if I have any practical suggestions, other than just shouting at people.

Firstly, remember I’ve blogged quite a bit recently about Incident Management.

Coming to Jim’s question, I do have some things that can be considered in circumstances like these. The points for consideration are in no particular order.

  1. Ask if you have created or added to the problem yourself by an inappropriate use of metrics. By this I mean you’ve told your team members that you are looking very closely at statistics for who is resolving Incidents, or that you’ve told the team you are looking at timeliness of resolution on an Agent-by-Agent basis (or worse, both). Now when I say ‘you’ve told’ I don’t necessarily mean you stood on a chair and said that's what you were going to do - remember that people gossip and talk informally amongst themselves. So, staff can get this impression by you simply mentioning the statistics to individuals in a negative way such as ‘why have you only resolved 10 tickets this week?’.

Metrics such as Agent performance need to be used with a little bit of caution, as they often don’t tell the whole story. A colleague of mine here at Serio tells a tale from when he worked as a programmer on a large team fixing bugs in an insurance firm. Bugs were logged, assigned to programmers, and then fixed. A new development team manger was recruited and after two weeks the new manager issued a memo to all development and testing staff complaining about ‘poor numbers’ and proceeded to name an engineer. The manager’s mistake was this: the ‘poor numbers’ guy was the brightest and best in the group, and handled some of the toughest jobs that came into the group – therefore his ‘fix rate’ was much lower.

Does this all mean that such statistics should be avoided? Absolutely not. It simply means that they should be used with caution, and you need to be aware of how your staff might regard the use of such statistics.

One positive step is to make sure that you have statistics for the numbers of Incident re-opened, focusing on who the original resolving Agent was, and to use this in conjunction with other Agent performance stats.

  1. Tell your teams that you perceive a problem. Try to bring them onside, and appreciate the need for quality rather than premature fixes. Try to understand how your team members see their role and what pressures they feel.

  2. Consider the roles of Team Leaders. Ask them to review some or all of the Incidents being resolved by their team.

  3. Introduce a 2-stage completion process if you don’t have one. By this I mean that when service teams resolve Incidents, they put the Incident to ‘Pending Complete’ and re-assign back to the Helpdesk or Service Desk. What then happens is we check with the customer proactively to make sure that the fault is resolved.

  4. Consider the possibilities of skills gaps within your teams, particularly the second and third-line support. Examine Incidents that have been re-opened for clues as to why this problem is happening.

  5. Make sure that the Helpdesk or Service Desk is actually re-opening Incidents, rather than logging new ones. I’m saying this but I know it’s hard to do 100% right. I’ve blogged before about having a call handling script – amend your script to ask if Incident have been reported previously, and give staff guidance on when it is right to re-open.

By the way, if you’ve read this post and are thinking ‘this does not affect me’ I have to ask how do you know, and are you sure? 

Monday, 05 Feb 2007

One of the tasks that many Serio Administrators can find a little daunting is preparing categorisations for Incidents, Problems and Changes. What I’m referring to us populating the data fields that help us to manage tickets.

I’m going to list some tips to help those setting about this for the first time, or those reviewing their categories.

  • Remember that someone will have to use this information during Issue logging, so make the number of Problem Area Categories that you have available during Issue logging less than 12 or so if you can. Then try to keep the number of Problem Areas within each Category less than 12 as well. Doing this gives a reasonable list for helpdesk and service desk staff to work with.

  • It’s OK (and indeed desirable) to have a small number of Issue Types. Remember these just record the type of ticket – for Incidents, you might have Fault, Query, Work Request and so on.

  • Try to create categories and data that is unambiguous. In an ideal situation, there will be an obvious best choice for most Issues being logged. Overlap in Problem Areas can cause the same type of fault to be classified in different ways.

  • The data you are creating will be used in Incident, Problem and Change reporting.

  • If you are starting from scratch, remember to examine any legacy data you have.

  • Also if you are starting from scratch, don’t be afraid to prepare your data away from the tool itself. Sometimes it’s useful simply to write your data down in a familiar tool such as a spreadsheet where your focus will be on the data rather than the software ITSM tool. However, if you do this make sure you are familiar with the structure of the data the tool requires.

  • It’s a process. Therefore after setting-up your data see how it’s working by examining tickets you’ve logged. Don’t be afraid to remove data that is not being used.

  • Remember that Serio allows you to have different classification data for Incidents, Problems and Changes.

  • Finally, have an ‘other/unclassified’ category. This will help for odd types of infrequently occurring Incidents – but make sure that this is not overused.

 

 

Friday, 02 Feb 2007

Over the past two weeks or so I have been blogging about Incident Management, and in particular how it is described in the ITIL Service Support book. At a later date I’ll draw these posts together, but for now I’ll explain why I started this thread.

My motive was to try to illustrate that it is quite straight-forward, and to try to de-mystify the subject for readers. It was also to point out that, valuable as it is, it’s not a silver bullet and nor is it that far removed from what properly managed IT service Helpdesk and Service Desks do anyway. Hopefully from my posts it is clear that ITIL Incident Management in non-prescriptive – it doesn’t tell you what to do. Instead, it offers a framework you can use for your own organisation and circumstances (therein lies both a strength [flexible enough for organisations of different types and sizes] and a weakness [insufficient guidance], depending on your point of view).

What I want to do is to write about some of the habits I see being adopted by the successful 'top 20%' of Helpdesks and Service Desks I have encountered. They are in no particular order, and the list is not definitive (in fact, I may return to this later). However, you’ll be able to see how you compare, and if you’ve others to add use the comments field. You might also want to have a look at my Success Factors in Incident Management post.

Habit: There is a good team structure, for instance Service Desk, Second Line and so on. Each Team has a Team Leader.

Habit: The overall Incident Management process is written down, with a clear focus on explaining what the responsibilities of different team members is. For example, we’d try to write down some of the responsibilities of our Team Leaders.

Habit: There is an Incident Manager whom everyone can identify. Everyone understands his or her responsibilities because these are written down.

Habit: The Incident Manager produces reports and metrics for the Incident Management team.

Habit: There is a constant drive to develop and maintain a Knowledgebase. There is a nominated Knowledgebase editor to whom suggestions can be made about new articles, and which the editor acts quickly – checking the articles for relevance and accuracy.

Habit: The quality of Incident records is seen as important. Staff make an effort to ensure proper classification and recording.

Habit: Strong focus on the central role of the Helpdesk or Service Desk in communication, with an emphasis on practicalities. For example, ensuring that ownership is maintained with the Service Desk even when Incidents are assigned to support teams, and then maintaining an involvement (particularly in terms of customer communication) with those Incidents by the Service Desk staff.

Habit: Careful use of status value to deliver a Workflow Position.

Habit: They make time for regular weekly team meetings for those most directly involved in Incident Management. These meetings follow a standard agenda, but allow flexibility for different issues to be raised. The meetings vary the chairman or woman, but the chair has clear guidelines on how to conduct the meeting.

Thursday, 01 Feb 2007

This is just to round-off yesterdays post about KPIs in Incident Management, and where you can find them in Serio.

Resolutions by the Service Desk or Helpdesk

This refers to the resolutions achieved by the Helpdesk/Service Desk. You’ll find this data available as a column in the First Time Fix report AGT14.

Percentages of Incidents Handled within SLA Target

Most the SLA based reports are clustered under ‘SLA Analysis’ – there are around 25 in all. Picking a few at random:

SLA5 – A nice, simple report which shows the percentage of resolutions on time, broken down by Priority.

SLA4 – A more detailed account, broken down by Company, of SLA resolutions on time.

SLA9 – Shows your response performance, again organised by Priority.

SLA10 – This is an interesting but complex report in graph form. It shows your SLA resolutions on time on a month-by-month basis. It also allows you to add your targets (for instance, 90% on time) onto the same graph for direct comparison, and also trend analysis.

Spread of Resolution Time

For this, see report ‘SLA Resolution Time Profile’ SLA12 and SLA12a. This show you in a convenient graph form how long Incidents are taking to resolve. The data is presented in a useful ‘banded’ form what the spread of resolution times is, and allows a lot of control over how thick or thin the bands actually are.

Pages