Accessing Availability Statistics

This post continues the topic of obtaining Availability statistics through Serio. The previous posts were ‘Using Serio to obtain availability statistics’ and ‘More on Availability statistics’.

To recap, I’ve said that:

  • you need to consider how you want your Availability data presented (and I gave 2 examples)
  • you need to define your Key Services
  • create a Service Level Agreement (SLA) for each of your Key Services
  • decide how you wish to represent your Key Services in the Configuration Management Database (CMDB).

Having done all of that, you now need to associate your SLAs directly with the Items (if you are using Items) or Services (if you are using Systems). This enables Serio to understand what the target Availability for the Key Service is question is (9:00 to 17:00 Mon-Fri, or 24-hours for example). If you didn’t know that you can associate SLAs directly with Items and Systems, you can – simply edit the Item or System in question and make the association directly.

We are almost ready to gather some statistics at this point, except for one thing.

Recall that I wrote about you need to be clear on what Unavailability means, or is defined as? Sometimes it is obvious (the Key Service is not functioning at all) but in other cases the service might be partly available. For instance, you might have an email system that can send emails within your organisation, but cannot send or receive them externally – does this constitute Unavailability? If you send a lot of emails to customers, or use emails for receiving customer orders, the answer is likely to be ‘yes’. Whatever the case in your organisation, have a clear definition of Unavailability.

This is important because you need to tell Serio what Incidents record Unavailability, and this is done through Impact. If you don’t have one already, create an Impact called ‘System down’. Use this Impact when record Incidents that result in Unavailability – this is how Serio filters routine Incidents from those that indicate a loss of service.

The other ingredient you need to add to the mix if the Key Service itself, represented in the Incident by either an Item or a System.

If we look at what is going into the Incident mix when you log an Incident, you can start to see how your Availability data is produced:

  • The Key Service on which we want Availability data, represented either by an Item or a System
  • The SLA for the Key Service (which we attach directly)
  • Something to tell us this is an Unavailability Incident (the ‘System Down’ Impact
  • A start date and time (when we logged the Incident) and an end date and time (when we resolve it)

I’ll look in my next post at how you use all of the ingredients above to produce meaningful statistics.