Serio Blog

Tuesday, 07 Nov 2006

Monday, 06 Nov 2006

This post continues the topic of obtaining Availability statistics through Serio. The previous posts were ‘Using Serio to obtain availability statistics’ and ‘More on Availability statistics’.

To recap, I’ve said that:

  • you need to consider how you want your Availability data presented (and I gave 2 examples)
  • you need to define your Key Services
  • create a Service Level Agreement (SLA) for each of your Key Services
  • decide how you wish to represent your Key Services in the Configuration Management Database (CMDB).

Having done all of that, you now need to associate your SLAs directly with the Items (if you are using Items) or Services (if you are using Systems). This enables Serio to understand what the target Availability for the Key Service is question is (9:00 to 17:00 Mon-Fri, or 24-hours for example). If you didn’t know that you can associate SLAs directly with Items and Systems, you can – simply edit the Item or System in question and make the association directly.

We are almost ready to gather some statistics at this point, except for one thing.

Recall that I wrote about you need to be clear on what Unavailability means, or is defined as? Sometimes it is obvious (the Key Service is not functioning at all) but in other cases the service might be partly available. For instance, you might have an email system that can send emails within your organisation, but cannot send or receive them externally – does this constitute Unavailability? If you send a lot of emails to customers, or use emails for receiving customer orders, the answer is likely to be ‘yes’. Whatever the case in your organisation, have a clear definition of Unavailability.

This is important because you need to tell Serio what Incidents record Unavailability, and this is done through Impact. If you don’t have one already, create an Impact called ‘System down’. Use this Impact when record Incidents that result in Unavailability – this is how Serio filters routine Incidents from those that indicate a loss of service.

The other ingredient you need to add to the mix if the Key Service itself, represented in the Incident by either an Item or a System.

If we look at what is going into the Incident mix when you log an Incident, you can start to see how your Availability data is produced:

  • The Key Service on which we want Availability data, represented either by an Item or a System
  • The SLA for the Key Service (which we attach directly)
  • Something to tell us this is an Unavailability Incident (the ‘System Down’ Impact
  • A start date and time (when we logged the Incident) and an end date and time (when we resolve it)

I’ll look in my next post at how you use all of the ingredients above to produce meaningful statistics.

Friday, 03 Nov 2006

Over the the Verso blog there are a couple of very useful posts about an ITSM topic that is often either neglected or misunderstood - the Service Catalog. The posts are What is a Service Catalog and Getting your Service Catalog started.

Serio has an excellent repository for Service Catalog information that allows you to log Incident, Problems and Changes against services from the Catalog, and to use that as the basis for reporting (for example Availability reporting). This repository is referred to as Systems.

You can find out how to create a System by using the Administrator HowTo guide (look under Configuration Management).

Thursday, 02 Nov 2006

Don't be shy - we welcome guest bloggers! If you want to write guest posts that will appear here we'd be delighted. It's a chance to share experiences and ideas with other Serio users, and include information about your own company and it's services.

An ideal article:

  • Follows the general trend of this blog - covering ITSM or using different aspect of the Serio tool
  • Is 300 to 400 words long (longer articles are OK, but it would need to be in multiple posts linked together as we often do)
  • Can be understood be people outside your organisation
  • Is a nice self-contained topic

You can sign off with a link to your company and information about the services it offers.

Interested? email georger __at__ seriosoft.com

Wednesday, 01 Nov 2006

Recall that I started a thread about obtaining Availability Statistics from Serio in my last post. I talked about identifying your Key Systems and thinking about the (Service Level Agreement) SLA you have with customers widely.

Recall also that if you want to access more of the theory there is a companion availability white paper available on our web site.

Having identified your Key Services, you need to think about what your Availability for the service should be. For example you might decide that your Billing System might need to be available Mon-Fri 9:00 to 17:00, excluding all public holidays. This is your 100% availability measure – if there is no downtime falling within these times, the Billing System is available 100%. You then need to create an SLA in Serio to represent this (I’ll call it an Availability SLA), and make a note of its name (it’s OK to re-use an existing SLA if one is to hand).

Tip: Make sure you are clear on what unavailability is. Sometimes it is obvious (flames coming from the back of the server) but there are other conditions where a given service is available but not performing correctly – maybe it is running very slowly. Think about what constitutes unavailability.

Having defined what your service times are, and with a strong understanding of what unavailability means you are ready to proceed.

You now need to think about how you are going to represent your Key Services in the Configuration Management Data Base (CMDB) – this is something you must have in order to obtain Availability statistics from Serio. There are two basic approaches you can take.

Represent Key Services as Items. In order words, create an Item that represents the service in the CMDB, or use a particular server already in the CMDB for unavailability Incidents (for instance, for your email service you might decide to use the email server itself).

Represent Key Services as Systems. This is my preferred approach – create a System that represents your Key Service (the System in Serio is a collection of Items that combine together in a structured way to deliver something of value to users).

Choose a method from those listed above, and apply it consistently. Then associate your Availability SLAs with the Items or Systems you are using for Availability purposes.

I’ll continue this thread with another post later in the week.

Monday, 30 Oct 2006

Those of our Serio’s customers who have employed me as a consultant, or those whose accounts I handle, probably know I have a few (OK, many) hobby horses. One of them is Service Level Agreements (SLA), and in particular thinking about the SLA both widely and in terms of business need. Or put another way, there is rather more to it than setting target response and resolve times for each Incident that the Helpdesk/Service Desk handles.

In particular, it is important to consider Availability. What I mean is this:

  • setting targets for the availability of key business systems
  • trying to ensure that those targets are met
  • reporting on what availability is actually met

At this point I should point out we have an Availability Management white paper that looks at a lot of the theory and background for you. If you are interested in the theory and more info generally on Availability Management check the white paper.

What I’m going to do in this and coming blogs is look at how Serio handles this and delivers Availability statistics.

So, you want accurate Availability statistics? First thing that you need to know is what systems or services you need statistics for (lets call them Key Services). It is often not easy to determine this, but I’ll assume you already have this information to hand. You will then need to have what the business requirement for Availability is for these Key Services, and there are a number of ways that you can express that. I've listed two examples below.

  • Downtime in hours. You might say that the maximum acceptable downtime for a Key Service is ‘4 hours per month’.
  • Uptime as a percentage. You might say that the target availability for a key service is 99% per month, meaning for your hours of operation the service is available 99% of the time. By the way, if that sounds good it still means around 2 hours actual loss of service per month on an 9:00 to 17:00 Mon-Fri SLA.

I’m going to continue this as a topic in coming posts.

Friday, 27 Oct 2006

Here’s a really nice but often unnoticed feature of SerioClient – speed logging (or logging an Incident in a few clicks). There are number of ways you can use this (for instance, repeat Incidents) but what I will write about is common Incidents. These are things such as password resets that you might have to perform 10 times a day.

Naturally you wish to log Incidents to record these, otherwise your reporting will be inaccurate. If you are logging Incidents such as this the usual way – by categorising and writing a description – you are going about it the long winded way.

Here’s how you would go about doing things the fast way.

  1. The next time some one calls for a password reset, log the Incident correctly as you do at the moment. Categorise it, and put in a decent description. Save the Incident.
  2. Look at the bottom of the screen. You see the link marked ‘Create Serio Alert’?. Click it. Give the Alert a name such as ‘Quick Password Reset’ then save it. (You only have to do steps 1 and 2 once) For the second (and all subsequent) password reset here is what you do.
  3. Select the customer details as normal.
  4. You’ll be taken onto the Incident Details screen. But look at the screen. See that there is a tab at the top marked ‘Serio Alerts’? Click this (or type Ctrl-2 shortcut).
  5. You will be able to see an Alert called ‘Quick Password Reset’. Select it an click save.
  6. Congratulations! You’ve just speed logged an Incident with a few clicks. It looks just like the Incident you created in step 1, but has a different customer.

Try it, it’s an easy way to log Incidents in a few clicks. Don’t forget to tell your colleagues what you’ve done, and how to use the Alert.

Wednesday, 25 Oct 2006

I’m going to return to the subject of the Configuration Management Data Base (CMDB) for this post, prompted by a customer that invited me to look at their CMDB.

ITIL makes a distinction between Asset Management and Configuration Management (the CMDB is what you get from an effective Configuration Management process). I’ll avoid quoting directly but it defines the two thus:

Asset Management: Accounting for IT assets for accounting or managerial purposes. This might include maintaining a list of items so that you know when warranties expire, or so you can perform some routine upgrade planning.

Configuration Management: Provide a logical model of all the Configuration Items within the organisation, showing how the combine and depend on each other to provide services to users.

In practice, this means that the two are quite different.

Asset Management is typified by lists of computers. These generally are not linked together to show much more than the machine and some peripherals attached (such as a monitor or printer). In an Asset Management system, it is sometimes hard to see server equipment amonst all the desktops, and you don’t see ‘virtual’ items such as web server instances or database server instances.

Configuration Management is typified by diagram rather than lists (though lists do exist of course). This is why Serio expresses the CMDB in graphical terms. The diagram shows a variety of items and shows how they link together and depend upon each other so that, for example, you can see the effects of shutting down a database server. The CMDB will typically have tangible and virtual items on it like I’ve mentioned above, and will show how items combine together to deliver customer services.

I’ll probably post more about this on Friday, but I’ll pose the question: how can you support an IT infrastructure that is not documented?

Friday, 20 Oct 2006

If you are thinking of introducing Change Management to your Service Desk/Helpdesk, it is worth asking the question ‘are we ready’.

It is a good question to ask because Change Management requires a degree of process maturity, and presents those taking up Change Management with all sorts of technical, procedural and cultural issues to overcome.

Assess yourself honestly against the following statements – record if the statement is mainly true or mainly false. As in all things, there are no hard and fast rules – you may have a successful Change Management process implemented successfully with none of these things being true. These are just based on my own experiences and observations.

  • We have a mature and effective Incident Management process. It is written down in a concise and useable form.
  • We have a mature and effective Problem Management process.
  • We have an Incident Manager and Problem Manager. We appreciate the importance of a Configuration Management Data Base (CMDB) in supporting effective Change Management. We are preparing to introduce one shortly after our Change Management process ‘goes live’.
  • We have a shared understanding of the problems we are trying to solve by introducing Change Management. For instance, there are not whole teams resisting the process, or who think that my organisation does not needs a Change Management process.
  • There is widespread support for the introduction of Change Management.
  • The IT function is quite ‘well organised’ within my company. For instance, there is a clear team structure and a clear understanding of responsibilities.
  • Our managers and team leaders are actively supporting the introduction of Change Management.
  • We have identified a person who we wish to act as our first Change Manager.
  • We have identified at least one person who will be our Configuration Manager and who will ‘own’ the CMDB.
  • We don’t have lots of different groups holding lots of important data in different places, such as access databases and spreadsheets. Our supporting data is easy to access and is within the ITSM tool.
  • We’ve picked one aspect of our IT services to be a pilot for Change Management. We want to learn from this experience before trying other aspects of the IT function with CM.
  • We understand that ITIL Change Management is not prescriptive in its approach (it does not tell us exactly what to do), though it gives us sound and sensible advice.

So how did you do?

More than 10 mainly true: You are probably OK to start thinking about your Change Management process – you are likely to succeed.

More than 6, but less than or equal to 10: You may not have sufficient process maturity or ‘buy-in’ from the rest of the organisation to succeed.

Less than 6: Hmmm. If you are still planning to go ahead, then I think you are brave. It might be that you don’t have much process maturity, and have insufficient buy-in. You may be working in a very informal and not customer-service focused environment.

Edit: Verso also have a related post on the subject of 'the people factor'.

Thursday, 19 Oct 2006

In my last post (see below), I began looking at the question of what and how many Service Level Agreements (SLAs) you need with a recap of what an SLA is.

Today, I’ll introduce a real world example, which I will use in the next post to illustrate the sort of decisions involved in designing your own SLAs.

Example: XYZABCD PLC provide services to the public seven days a week between the hours of 8am and 8pm. This operation is supported by an internal IT Service Desk.

There is no formal SLA between the Service Desk and the business, and the Service Delivery Manager is being pressed to provide one. Internally, however, the Service Desk do operate an informal SLA, and hope to put this on a formal basis.

Staff at the company can log support Incidents with Service Desk by telephone during working hours (8am-8pm Monday to Sunday) and by email or web outside these hours.

In fact, however, the engineers capable of resolving any faults are only at work between 8am and 5pm Monday to Friday. Outside those hours, the Service Desk staff manning the phones are able to resolve certain straightforward types of Incident, such as password resets, and can offer simple workarounds, but must wait until an engineer is in the office to deal with more complex problems.

Service Desk staff prioritise Incidents based on Impact. Non urgent requests or queries are given a low priority and may be dealt with within a couple of days. Faults that are causing inconvenience to user, but not preventing them working, are given a medium priority and must be dealt with the same day if possible. If the user cannot work, this is treated as a high priority and must be dealt with immediately.

Additionally, there are always a duty engineers who may be called in after 5pm when there is a critical fault affecting the company’s ability to provide service to the public (for example the failure a key network component). The Serio Command Center detects and raises alerts about such faults and sends text messages to the duty engineers. Should such an event occur the duty engineers must attend as quickly as possible, day or night. The Service Desk wants to ensure that key services are always available during opening hours.

Pages