Serio Blog

Wednesday, 21 Mar 2007

This post is at the request of a customer trying to make sense of Asset Management and Configuration Management. I’ve covered these topics before (search in this blog and you’ll find them) but this post will assume no prior knowledge or experience.

Let’s get started with some definitions – in my own words.

Asset Management: This is where you maintain a list of assets for housekeeping and accounting purposes. Typically the assets are computers, monitors, printers and so on. When I say maintain a list I mean just that – a list, one after the other, usually classified in ways that are helpful to you.

Configuration Management: This is where you try to create a map (or diagram) that shows how building blocks (Configuration Items, [CI or Items for short]) combine together to deliver services to users. If you take a system that our customer might be familiar (such as Serio) it might show, at the risk of showing my age, something like this:

SerioServer running on a computer Zephod, with a dependency to a SQLServer instance running on a computer called Gargleblaster. A web server instance also running on computer Zephod, with a dependancy to the SerioServer instance

.. and so on . I can count 5 Items in the above description, and these would all be linked together (possibly as a System, if you are so minded). The terms used for such data is Configuration Management Data Base (or CMDB for short).

Hopefully the definitions are clear. I’ll leave the issue of why we might do either of these until later.

Now you might be thinking ‘Configuration Management sounds more useful and advanced, I’d better go for that’. My response to that is: not necessarily. It depends entirely on your situation, your maturity in terms of IT Service Management, and what your goals are. We’ve got a ‘golden rule’ at Serio about data, and it’s this:

Don’t store data you can’t verify or keep up to date.

So there is your first challenge. Regardless of what you do, you need to address the issue of keeping your Asset register or CMDB up to date. The golden rule sounds simple and obvious (and it is) but it’s surprising the number of people who create data stores with no regard to its future integrity.

Generally speaking, keeping the Asset register up to date is easier. You make sure all new equipment must have an approved number on it (usually by setting up a protocol with your purchasing or goods-in department) and allocating the number allows you to book it in. When Assets are scrapped you have a similar protocol, and delete from the register.

In the real world however, you will find that people still find ways to introduce Assets you are not aware of – like going down to the local computer store and using an expense account credit card.

Creating an Asset register is tiresome, but usually straightforward. Simply walk around your office putting tags on things, recording their details as you go (easy if your got 250 computers, not so easy if you’ve got 25000).

I’ll continue this topic in future posts. 

Monday, 19 Mar 2007

This is a post about some gadgets and features you may not know about on Serio 4.6 and later versions.

Chat

There is a Helpdesk/Service Desk Agent-to-Agent chat facility. It works between users of SerioClient, and lets you hold a text chat session with them, just like you might have seen with other messenger-type applications.

To use Chat, click on the ‘Users’ icon in the bottom-left hand side of SerioClient, and then select the person you want to chat to from the left-hand side  Serio will then launch a chat panel, simply type your message and go.

Ticker-tape

Very few people seem to know this, but there’s a ticker tape message. For example, you might want to broadcast

  • Our billing and accounts system is down or
  • There are free cakes in the meeting room

and have all SerioClient users see that as a ticker-tape message running across the top of SerioClient, then simply click the Serio logo, and select ‘Create a Ticker-Tape Message’.

Export all or part of an Incident/Problem or Change to HTML

Like the title says, exporting Incidents in a convenient HTML form is easy. You can even control the format that’s used by editing a template (for instance, you add your company’s standard font or logo). To do this, simply right-click in Incident, Problem or Change Management lists, and select ‘Export to Public Knowledgebase’.

All this does is tell Serio you want the content of the Incident is HTML format – it’s up to you where you store the document, there is no need to put in into a KB directory. Serio will only export Actions you’ve flagged for export to the KB (these display in SerioClient with a KB icon beside them). You can read more about this by searching for the topic ‘Exporting Issues to the Public Knowledgebase’ in the HowTo guide.

Linking together Incidents, Problems and Changes

The quickest was is by using Copy/Paste link – it takes just a few clicks.

Linking Items Together

Again the quickest way is by Copy/Paste Link

This is a follow-up to my earlier Problem KPI post. Commentator Mark asked ‘what about Known Errors’, something my suggestions for KPIs did not cover. Thanks Mark.

Remember that the whole point of Key Performance Indicators is to tell us how effectively some process or activity (in this case Problem Management) is working. Therefore the question is what Known Errors tell us from a KPI perspective – should they be included or not as a KPI?

My feeling is we should. The Known Error count is another indicator of ‘health’ of the Problem Management process, so I’ll amend point 2 of the earlier post as follows:

2. Problems Resolved or with acceptable outcomes

Show the number of Problems resolved as follows:

  • Counts of Problems Resolved by raising of a Change(s)
  • Count of Problems that lead to a Known Error state
  • No. of Problems flagged with a Workaround

Interpreting the figures you get is quite tricky, as of course there is no ‘right’ figure. However, a constant run of zeros (or very low numbers as a percentage of the whole) would indicate to me that there may be some issues in identifying these outcomes as part of the process.

Friday, 16 Mar 2007

This is a follow up to my last post on the subject of KPIs in Problem Management – read that previous post for background. In what I write I’m assuming that the KPIs are going to be used in a report whose audience will be the Problem Management team (therefore, quite detailed).

I’ve concluded that we are looking for signs that the Problem Management process is working. Here are some suggestions for you to try.

1. Number of Problems raised.

This is a tricky one – I’d never show this figure without comment. A figure which is low may indicate a situation where Problems are not being adequately identified, as I’ve discussed in this post. You can compare this to the number of Incidents raised over the period, to see if the rising/falling trends are the same.

2. Number of Problems Resolved.

There are a few ways I’d present this. I’d show the number of Problem satisfactorily resolved, and the total number of Incidents linked to these Problems. This is useful because it tells us how many Incidents we’ve finally got to the root cause of.

Then I’d produce a summary listing, so that the detail can be seen. I’d show the Problem Reference number, Priority, a one-line description, and the number of linked Incidents. Doing this allows readers to see how the ‘overall’ figure breaks down.

3. Days open for unresolved Problems.

One of the things that you wish to see in your Problem Management process is that tickets that are logged do (eventually) move to resolution, and so you want to see how long Problems have been on file for. An easy way to express this graphically is to have a ‘days open’ figure along the vertical axis, and then to list each open Problem along the horizontal axis, with its reference number, showing how long it has been open.

I have met customers who have used Service Level Agreements on Problem Management. Generally I don’t approve, partly because SLAs should be for things that a customer sees (like an Incident), but mostly because I don’t think it improves efficiency or promotes best practice. Sometimes Problems can (by their unexplained nature) can take a long time to come to a satisfactory resolution, and the outcome we want is a quality (complete, accurate) resolution. What is important though is that your own procedure make sure that Problems don’t just stall, but continue to be moved forward.

4. Activity since last reporting period, per Problem

I’ve written above how we need to make sure that Problems are being looked at, priorities, managed and assessed by the team – rather than simply gathering dust in your service management tool.

Whilst more of a status report, list each Problem and give a brief (no more than one or two line) summary of what you’ve done since the last reporting period.

So for instance, if you are reporting every fortnight you might say ‘Identified application X as possible cause, currently discussing with vendor support’ for a given Problem ticket. Serio uses would simply create an Action called ‘Progress update’ for this purpose, to be completed by those assigned Problems within the service management teams.

Needless to say, you are looking for signs of life across all or most of your Problem records because without that we won’t move to resolution.

[Edit: I issued an amendment (actually, a little more detail) to this post. Click here to see the post]

[Edit by Blog admin: See also the related Incident Management KPI post

Wednesday, 14 Mar 2007

Customer Nick asks for ideas on data he can use as a KPI for Problem Management.

That’s a really good question, and before answering it I’ll refer to the definition of what a Problem is. However, the definition of a Problem is slightly different to what the goals of Problem Management are, namely (taken from the ITIL Service Support book):

“The goal of Problem Management is to minimise the adverse impact of Incidents and Problems on the business that are caused by errors within the IT Infrastructure…”

In other words, if you ask ‘why do we bother with Problem Management’ the answer is ‘to try to stop Incidents from happening (thereby avoiding their cost and inconvenience)’. Therefore, any KPI should be targeted toward measuring the effectiveness of the Problem Management process – just as Incident Management KPIs have an element looking at timeliness of resolution.

But hold on a minute, it looks like we’ve just described something very difficult. Our ideal KPI would show how many Incidents had not occurred due to our Problem Management process, but if you can get a report from your Helpdesk or Service Desk software telling you how many Incidents have not been logged then all I can say is wow.

I have seen instances where Problem Managers have shown a declining number of Incidents say over a 3 month period, and concluded this is a direct result of Problem Management. All I can say is maybe. Incidents can decline for all sorts of reason such as

  • Investment in the infrastructure, or technological changes unconnected with Problem Management
  • Less business activity (like after a Christmas rush period)
  • Less business or technological change
  • Fewer new employees joining

…and so on. The converse of the examples above can lead to an increase in Incidents, even if your Problem process is bang on the money. Personally, disentangling all these reasons is so hard and unscientific I don’t think it’s a good avenue to follow for anyone.

Instead, I guess we’ve got to use some common sense. Common sense tells us that if we have an effective process in place then Incidents and downtime costs will reduce. So to me it makes sense to focus on ‘is the Problem Management process working?’.

Having described the challenge at length, Friday’s post will describe some actual Problem Management KPIs you can use.

Monday, 12 Mar 2007

This post is about an often overlooked feature of the Serio CMDB. Risk Assessment is a formal process of identifying and measuring risks posed to organisational Items - for example the risk of a hard disk crash on a key server, the risk of theft of laptops, the risk of hackers stealing customer data, the risk of someone spilling coffee on a computer and so on.

Risk Assessment helps you to decide on Changes that you need to make to your systems to reduce risks (of course, Change Management naturally complements the Risk Assessment process by providing controlled a way to implement such Changes). Naturally Risk Assessment is a proactive activity – trying to stop the costs associated with Incidents before they occur.

A formal Risk Assessment exercise is a key step in gaining BS7799/ ISO17799 certification. BS7799 requires that you should develop a Risk Assessment methodology which takes into account the value of Items to your organisation and the seriousness of threats.

So far so good. Serio adds value to this by providing a framework for storing and reporting on the results of your Risk Assessment exercise which is geared towards the requirements of BS7799.

The Risk Assessment Process

The Risk Assessment process in Serio consists of gathering information about Items and the risks they are exposed to. This information is stored in the Serio Configuration Database for reporting and analysis (if you’ve ever wondered what the Threats and Vulnerability icons were for, this is it).

The following description introduces the main concepts and stages of the Risk Assessment process. Once you have familiarised yourself with this introduction, please see the Risk Assessment Roadmap in the HowTo guide (distributed with Serio products) for a step-by-step guide to implementing the Risk Assessment process using Serio.

1. Identify organisational Items that must be protected from risk. Such Items may include:

  • Key servers or network equipment
  • Key applications, such as web or email servers
  • Information (Virtual) Items, including databases or files containing sensitive information, such as customer records, product specifications, sales data, eMails, etc.
  • Services, such as heating, lighting, power and telecoms.

2. Use your knowledge and experience of your organisation to assign an Organisational Value to these Items. This is a matter of answering the question, "On a scale of, say, 1 to 5, how important is this Item to our organisation, in terms a loss of availability, confidentiality, or integrity?" A score of 5 indicates the highest Organisational Value.

3. List the Vulnerabilities of these Items. Any aspect of an Item's location, function, use, or characteristics which puts the Item at risk should be counted as a Vulnerability. Assign a Vulnerability Level, on a scale of 1 to 5, to each Vulnerability. (A score of 5 indicates the highest Vulnerability Level.) For example:

  • Exposure of web server to the Internet (4)
  • Mis-configuration of the firewall (3)
  • Positioned underneath the air conditioning water storage tank (3)

4. List the Threats posed to the Items. Theft, breakages, hacking, website defacement, viruses, worms, media failure, and security violations are all examples of Threats that Items may face.

Note: Relationship between Threats and Vulnerabilities There is a link between Vulnerabilities and Threats.

For example, the fact that a laptop is portable (Vulnerability) exposes it to the possibility of theft and breakage (Threats). If a computer's operating system is misconfigured (Vulnerability), it may be threatened by viruses, worms, or unauthorised access (Threats).

Identification of Threats and Vulnerabilities is an iterative process: discovering Threats may lead you to identify Vulnerabilities, and these in turn may reveal further Threats, etc.

5. For each Item, assign a Threat Level, on a scale of 1 to 5, to each of the Threats that you have identified. (A score of 5 indicates the highest level of Threat.)

Two Items may face a Threat at an equal Threat Level. For example, two servers may be equally threatened by hard disk failure. However, where one of the Items is more valuable to your organisation than the other, the Threat to the more valuable Item is clearly more important. To reflect this, Serio calculates the Risk Level associated with a Threat, according to the following formula: 

Risk Level of Threat = Organisational Value of Item x Threat Level

For example,

  • Web Site (Organisational Value = 4)
  • Defacement Threat (Threat Level = 4)
  • Risk Level = 4 x 4 = 16.

6. As you gather information about Items, Threats, and Vulnerabilities, you can store it in Serio. You can then use this data you to produce reports or to search for Items which are exposed to unacceptable levels of Risk or Vulnerability. This analysis should help you identify Changes that you need to make to your systems to reduce the risks arising from particular Threats or Vulnerabilities. 

Friday, 09 Mar 2007

I’m going back to the point raised by Peter, as mentioned in this post from last week (specifically Peter has to improve service with a reducing budget, demoralised staff and reducing headcount).

I’ll recap on some of the things I’ve covered:

This blog post will be about shared vision and understanding in IT Service Management, and how important it is that, as a manager, Peter works to achieve this (excuse the term ‘shared vision..’ even though it sounds like something David Brent might bang on about).

Let me start by saying what I would not do. I would not want to be a Moses figure coming down from the managerial mountain with ‘the process’ or with ‘an ITIL vision’ handing out papers, processes and directions. This can engender a feeling of antipathy before you even get started the way that any change imposed in a work setting can do (my experience of working life gathered in the last 25 years is that generally people are resistant to change – even in IT – where technological change is viewed very differently to organisational change). Remember that your staff may think they currently do a great job, even if customers think they do not.

What I would do, and have done in the past, is to persuade that we need to do things better and differently, and to encourage people at all levels to suggest how.

As always though, there is a right way and a wrong way. Don’t ask your staff to write down suggestions and send them to you – the chances are you’ll get almost no response. Many people are intimidated by blank sheets of paper (even me, from time to time).

A better approach is to organise (i.e., chair) workshops where people can suggest improvements. However, organise these after you’ve applied some thought yourself about the problem(s) as you see it, so if the workshop is slow to get started you can prod it into life. Make careful notes during the meeting, and report back promptly afterwards.

At all costs avoid the workshop turning into a festival of moans. The first of these I ever organised did exactly this, with the 1st-line team immediately saying “the network team cherry-pick Incidents leaving behind stuff they don’t like the look of”. As it happened this was true, but with network team members in the room the tone became confrontational.

At the time I handled this badly. What I’d do now would be to re-state the negative in a more neutral way, as in ‘You mean some Incidents you assign to the network team are not being handled as quickly as customers would like’ and try to move onto a more consensual tone.

During these workshops, some people will contribute a lot, others a little or nothing. If you are reorganising and creating new roles (such as an Incident Manager or Service Level Manager) try to reward the most interested and positive with new roles, rather then simply choosing people based on seniority (though I know that can be controversial).

I’ll continue this theme in future posts, looking at some of the specific ITIL disciplines Peter might consider. 

Thursday, 08 Mar 2007

In this blog article I want to create an ‘overview’ topic about email in Serio. The reason is a lot of users seem unaware of some of the things you can do with email.

I will start with when and where emails are produced. All of the emails are optional – so they can be switched off if you’d rather not send any.

NB: I’m primarily going to talk about Incidents for brevity’s sake, but what follows can also apply to Problems and Changes as well.

  • After logging an Incident, a confirmation can be automatically sent as confirmation to the customer.
  • When resolving an Incident you can send a ‘we’ve completed this’ email to the customer. The email can contain what you did to resolve the Incident, and invite a satisfaction rating.
  • You can send an email to customers whenever you take an Action. You can do this by just having a single Action called ‘Send email’ whenever you want to send one, or you can make each Action send an email to the customer. This is like a courtesy that says ‘we’ve updated your Incident, here is what we did’.
  • You can have Serio notify Agents (people delivering service to customers) when you assign an Incident to them.
  • You can have Serio ‘copy’ lots of Agents when an Incident fitting a particular set of criteria is logged. We call these Broadcast Alerts.
  • If you are a Serio Command Center user, you can have Serio email you when different network and server conditions are detected.
  • If a customers uses SerioWeb, to log an Incident or place a note on an existing Incident, you can have Serio send you an email to let you know about this.
  • At a given escalation point, the Escalation Engine can send an email to the customer to let them know you've escalated the ticket.
  • To let a supplier know that you’ve assigned an Incident to them, as part of a supplier management process.
  • We can also send emails about emails – for example, a customer replies to an email you sent earlier, we can send you a notification about that.

That’s a lot of emails. The important ones (generally those that go outside of the support desk) you have total control of the content through the use of eDocs (electronic documents). For example, you can choose HTML or text based formatting, and can include lots of data from the Incident (such as SLA targets, who is handling the Incident, which team it is assigned to, the Incident description and so on).

There is also another whole raft of notifications and warnings that are used in Change Management (a topic for later).

There’s some pretty nice stuff which might not be obvious.

Variable ‘sender’ address. If you are a ‘virtual’ helpdesk or service desk (you provide services to different companies as if you were part of that company), Serio can vary the senders email address of outgoing emails so that one email is sent from ‘support@bigtruckhelpdesk.com’ and the next from ‘helpdesk@modenesesupportgroup.it’.

Language-based emails. (This is a great feature, I helped design it). If you offer services in more than one language then Serio can switch the language of emails. Suppose you offer support in English, French and Dutch. When an English speaker logs a ticket, they will get an English language confirmation. If a Belgian French speaker logs a ticket, they’ll get a confirmation in French. What is cool about it is that if a Belgian Dutch speak logs a ticket, they’ll get a confirmation in Dutch (in other words, it’s not determined by Country).

Monday, 05 Mar 2007

I’ve blogged previously about improving your success in an ITSM project here and here – mentioning setting yourself clear objectives, and resisting the urge to complicate things.

In this post I’m going to look at another ingredient – roles and responsibilities.

A colleague here at Serio tells the apocryphal story of a TT racer in the Isle of Man whose engine fell out at Windy Corner. When they got the bike back to their workshop, the two mechanics looked at each other and said ‘I thought it was your job to tighten the engine studs!’.

The moral of the story is, I guess, if you want something to be done don’t leave it to ‘the team’.

What you should do is define some clear roles and responsibilities. For instance, you might define the Incident Manager role, give it to a person and start to define the responsibilities thus:

Responsibility for documenting the Incident Management process (keep it simple though), ownership of and responsibility for the quality of Incident data, management of the Incident management team, production of monthly management reports listing issues and proposals for improvement…

… and so on. If you are just starting out, you’ll want to consider mapping both roles and teams out carefully, and try to define

  • A Service Desk Team
  • An Incident Manager
  • A 2nd-line Support Team (with a Team Leader)

and you can then expand this with other roles such as Problem Manager as required and as demanded by your own situation.

Remember it’s not enough to create a job title – staff being reorganised for IT Service Management need as much help as possible for them to understand what is required. Therefore each role should very clearly have something which specifies what is required and what the responsibilities are – in detail. Take the objectives and goals you started with and use these to shape your roles and responsibilities to deliver what you want to achieve.

Of course creating roles spreads the workload and gets others involved, but it does require a degree of shared understanding and commitment – something I’ll address in later posts.

Friday, 02 Mar 2007

This post will be all about Escalation in Serio, the role that the Escalation Engine plays, and a personal perspective on ‘good practice’.

First of all, let’s define Escalation as it means different things to different people. Back in January I posted about Escalation in Incident Management, and that post defines both Function and Hierarchical Escalation. You’ll be pleased to hear that the Escalation Engine can handle both.

Serio Service Level Agreements have a ‘grid’ structure, and for each Response and Resolve target you define you can have up to 12 Escalation Timepoints. That means, say you have 4 different Response and Resolve Targets, you can have 4 * 12 = 48 Escalation Timepoints all occurring at slightly different times. That’s a lot of Timepoints! Of course, each Incident will have no more than 12, but the Incident next in the queue could have a different 12 – you get the idea.

Personally, I sometimes wish that the tool had less than 12, as managers are sometimes tempted to use all 12 – it seems to be a case of ‘we’ve paid for 12 so let’s use 12’ but in ITSM, as in life, less is often more. I’ll explain why.

Every Escalation Timepoint should have a consistent meaning in your Incident Management process, and for each Timepoint you need a managerial or service-based response. Without this, how will your staff know how to respond to the notifications and alerts they receive?

Let me give you an example of good use.

  • Escalation Timepoint 1 = SLA Breach in 2 hours
  • Escalation Timepoint 2 = SLA Breach in 1 hour
  • Escalation Timepoint 3 = SLA Breach
  • Escalation Timepoint 4 = SLA Breach by 25%
  • Escalation Timepoint 5 = SLA Breach by 50%

I mentioned a management response. Ideally you’ll have, in your operations manual (we have a template, request it from support), something along these lines:

At Escalation 1, the assigned Agent should notify their team leader (or Service Level Manager or whoever else you feel is appropriate) of the Escalation – and ask for guidance.

Escalation 2 is for information – breach imminent.

At Escalation point 3, contact the customer at let them know you are aware of the SLA breach

… and so on.

The real trick is to avoid a situation where lots of alerts and messages are produced that everyone ignores. If I’ve just described your Helpdesk or Service Desk, revisit both how you’ve set-up the system and the information and training you’ve given to your staff.

The Serio Escalation Engine helps out in the following ways:

  • It can send you alerts as the Escalation Timepoints occur
  • It can warn you about response breaches
  • It can place Actions on Incidents, Problems and Changes automatically to record the Escalation, giving you total control (through eDocs) over the Action comment
  • It can automatically send emails to the customer
  • It can automatically re-assign Incidents if that’s what you want it to do (though I’ll leave it to you to decide if that is the right thing to do)

Now which of those is Functional, and which Hierarchical?

Have a great weekend in the winter sun!

Pages