Serio Blog

Thursday, 07 Dec 2006

A customer has recently asked for ‘advice to give to a brand-new Change Manager, and what they are supposed to do’, and that is the subject of this post (possibly inspired by this post?). The organisation in question is quite small, with less than 10 IT staff, but does have a significant IT infrastructure to support. They are part-way through an ITSM programme at their company.

First of all, it is important to ask the question ‘why are we implementing Change Management?’ and if you can, prioritise the answers you give. Don’t recite the ITIL books as a holy text, if your company is experiencing problems or issues around IT change then use these problems as guides and checks for yourselves. For example, you might answer:

‘We implement changes and sometimes they work OK, other times they cause more problems than they solve’

‘We don’t estimate costs and time very well’

‘We seem to choose the worst possible time from the business’s point of view’

….and so on – I’m sure you get the idea. Keep these problems/issues and use them to validate any Change process you implement. After all, it should solve them.

Before giving an advice to our new Change Manager, let’s have a look at some aspects we would expect Change processes to include. I think the bear minimum would be:

  • Impact assessment
  • Reason for implementing the Change
  • Cost of Change, and it’s expected duration
  • Back out plan

This list is not exhaustive. You could also have test plan, acceptance criteria, risk assessment and others – but as a minimum, for a smaller company, this would be my list.

Impact Assessment

This is where the Change Manager writes down the potential impact of the Change. The sort of thing you are interested in is which services to users could be affected, and how long they might be affected for. You might include some assessment of risk into that impact assessment.

If you look at this more closely, you’ll find I’m saying ‘take a change that might be proposed on a particular computer system and assess the impact on the services we offer to users’ – and you need to complete this task with a degree of certainty.

The Change Manager needs to think: how do I do this? The best answer is to use the Configuration Management Data Base (CMDB) – an essential underpinning technology for Change Management. This post is not about the CMDB so if you want to find out more about this use the blog search facility on the right: there are many articles in this blog.

If you don’t have a CMDB then it’s going to be tough work – maybe you’ll be forced to ask some very clever person ‘who knows’ but I’d say that you cannot expect to have reliable Change if this is how you work because people make mistakes. In my experience before I joined Serio this is a very common state of affairs.

Advice for Change Manager: Don’t take on the role without a CMDB unless a CMDB is just around the corner, or you have some other reliable means of assessing impact.

Back-Out Plan

This is your escape route if things go wrong – ‘how to we restore the system if the unexpected happens’. Sometimes your technical staff may say ‘no back out plan is needed’ or ‘no back out plan is possible’.

Advice for Change Manager: Don’t be fobbed-off. Always ask for a realistic and costed back-out plan.

I’ll continue this post later and look at some of the tasks Change Managers typically do.

Suggested white paper: Introduction to ITIL

 

Monday, 04 Dec 2006

This post has been suggested by someone who I’ve known for a long time as a specialist engineer in IT Service Management (let’s call her Linda), who has recently taken a new role as a Service Delivery Manager in a new company (in other words, she has a new role and a new organisation to cope with). The first thing she noticed on day 1 was that although her company (I’ll call them WidgetCo) has just 600 users and 8 IT staff, there were over 600 outstanding Incidents currently open, some of which had been open since April.

WidgetCo don’t use ITIL or have any concept of IT service management, though no doubt that will change. My friend’s comments to me where: ‘what kind of things can I do to get the outstanding Incident count down to a more manageable level?’

Here are some of the things I’d do, in no particular order.

  1. Warn senior managers, directors and project sponsors about what you’ve found. Give them a quick summary of the problem, indicate what you are going to do and how long, and promise to report back at the end of this period. If you are running reports that show Service Level performance, warn them this and possibly other statistical measures may take a beating while you clear the backlog.
  2. Consider letting your customers or users know what you are doing, perticularly if in the short term there will be an adverse effect on quality of service.
  3. Give you new staff some quality guidelines (Linda has told me not every Incident is logged). This should include
  • Log every Incident
  • If you resolve an Incident, enter it on the system as resolved within 3 hours.
  • Initiate some basic quality measures, such as specifying a decent Incident description and resolution comment
  1. Find out if some of the Incidents you have are actually Requests for Change. You need to make a distinction between Incidents (where some fault to a user service has occurred) and situations where customers are requesting amendments to services (important though those requests might be).
  2. Assess the older Incidents (maybe from April through to August) one by one – you may need to have someone help you with this. Most ITSM tools such as Serio allow custom ‘status’ values to be attached to Incidents (in Serio, Agent Status A and B are examples of this). Create some status values to attach to your Incidents so that you start to get a grip on what the data means. It will also help you to remember which have been reviewed. The following status values my be useful:
  • Outstanding (means reviewed and is still required)
  • Change raised (means this Incident was actually a Change)
  • No longer required (in my experience, you may find some of the Incidents have either been fixed and the system not updated, or something has changed elsewhere and action is no longer required)
  • On hold (means the status is unclear until further review, naturally you wish to keep this down to a small number)
  1. Check your statistics to see who is actually resolving Incidents, in order to see if some staff members are under-utilised. Use this data with caution – those with lower resolutions shown might be dealing with more complex Incidents. Alternatively, they might be the ones who 3 hours per day on ebay.
  2. Stress to your staff the importance of resolving Incidents, and consider posting a ‘daily outstanding’ on the notice board or some other prominent position.
  3. Consider forming a special team to address the ‘Outstanding’ Incidents you’ve identified above. In Linda’s case, this would be unlikely to be a team of more than 2. I stress the word ‘consider’ here – you may choose to leave Incidents with engineers that are currently handling them.
  4. Incidents will be assigned to Teams and Agents. Check the assignment levels – a bar chart may be useful for this. If you see bottlenecks (for example, 300 Incidents assigned to one individual) take action by re-distributing as required.

Some of the suggestions here many lead to other problems. I’ll write about these in a follow-on post.

Friday, 01 Dec 2006

This is another post on the subject of creating your own reports, your can see the last in the series by clicking Writing Custom Reports Part 2.

The previous posts have looked at SQL, introducing some key clauses in the SELECT statement such as ORDER BY and GROUP by.

Now let us look at a problem that comes up very quickly when you are writing a report. I said in this post that databases such as Microsoft SQL Server and Oracle are made up of tables and views – in order words, the database is really a number of tables whose rows and columns contain the data we see on screen and on page.

The important phrase there is ‘number of tables’ meaning there is more than one. Suppose that you want data from one table and another table on the same report? Not one set of data after the other, but mixed together like you only had one table. Here’s what you do.

This is time for the Join. I will not capitalise the Join word, because it is not an actual SQL keyword, instead it really an expression. It also happens to be one of the most important things you have to grasp in SQL, and you will not accomplish much without understanding the Join concept.

The Task

Produce a status report that lists all Active Incidents, shows the reference number, date of logging, and the Team and Agent to whom they are assigned.

Looking at the documentation that comes with the Serio SDK, I can see that Active Incidents, and their date of logging, can be obtained from sv_issue_basics using this query.

select issue_no, issue_logging_date_time

from sv_issue_basics

where issue_status = 'a'

order by issue_no

However there is no assignment data! Again referring to the documentation I see that assignment information is stored in a helpful View called sv_issue_assignment. By Joining this as follows, we get what we need:

select sv_issue_basics.issue_no,

sv_issue_basics.issue_logging_date_time,

sv_issue_assignment.assigned_agent_id,

sv_issue_assignment.assigned_team

from sv_issue_basics, sv_issue_assignment

where sv_issue_basics.issue_status = 'a'

and sv_issue_basics.issue_id = sv_issue_assignment.issue_id

order by sv_issue_basics.issue_no

Notice that the statement has changed as follows

  • In the FROM clause, where there was one View named, there is now two. This is because we are taking data from two different Views.
  • The naming of things has changed, and we have to say what View a given column is in – for instance, sv_issue_basics.issue_no means the issue_no column in the sv_issue_basics View. It is good practice to do this if you are using two or more Views.
  • A new clause has been added: and sv_issue_basics.issue_id = sv_issue_assignment.issue_id. This is where we tell the database what data is common between the two tables, so that the database can make a ‘join’ – a new row. In this case both tables contain an issue_id, and we can use that for the join.

This statement satisfies the task we have in set ourselves.

I will continue this series of posts next week, hopefully with an open-source report writer at we'll see how we can use what we have learned.

Thursday, 30 Nov 2006

Over at the Verso blog there's a post called Service Desk or Help Desk - What's in a name? It may help to answer the question we are sometimes asked about this blog and the fact we oftesometimes use both terms, and other times just use one.

Wednesday, 29 Nov 2006

This topic was suggested by a colleague of mine here at Serio, who asked me to write about how Helpdesk or Service Desk managers can help to give their staff further challenges at work, and/or a sense of career progression. He put it more prosaically : ‘what can you do to stop staff feeling like they are stuck in a rut and leaving?’. His comment was triggered by a conversation with a long-standing customer who had lost one of their brightest and best.

As always, I’ll try and stay with the practical. Like a lot of good things, what I’m about to say may seem somewhat obvious but you’d be surprised at how often it is neglected by what are otherwise very capable managers.

Lots of things can lower employee morale, such as poor pay, permanently angry customers, but I’m going to stick to how you give staff new challenges in what might seem to be an environment with limited promotional opportunities.

It really doesn’t matter if have no ‘promotion’ positions available, because what you can do is to give your staff new functions to perform, and this is where ITIL becomes really useful. Let us take the case of a Helpdesk or Service Desk with 6 staff and a manager, where the manager is responsible for producing monthly reports, analysing service levels achieved and so on.

This is quite typical in my experience – we have a manager and then a group of people who are responsible for dealing with customers day to day.

In this situation, you can (as an example) create a new role of ‘Service Level Manager’, and ask one of the staff to extend their repertoire of skills to include this. In doing so, it is absolutely critical that you spell-out for the person concerned what they have to do and be both specific and practical. You might say:

  • Produce monthly reports on Service Levels achieved (giving a sample report), including a summary and conclusion
  • Produce recommendations for improvements where Service Levels fall below that expected or required (again giving some examples)
  • Use the ITSM tool to monitor ongoing Service Levels day-to-day (again being specific. If you were a Serio user you might say ‘Act on the Response Breach Warnings that are sent’)

In order to make this work, you have to hand-over all (or almost all) responsibility. What I mean by this is that if you are discussing new Service Levels with customers, your Service Level Manager might take the lead (which doesn’t mean that the Service Desk or IT Manager is not involved, just that someone else leads the discussions and leads the process of agreement). Also, the person concerned should have a visible part to play in implementing successfully their own recommendations, and be able to take some credit for service improvements that result.

In other words, there is some scope for initiative. If you just ask someone to compile reports that will not achieve what I am talking about here.

Other benefits come from this approach, the principal of which is a wider appreciation of, and involvement in, IT service management.

Make sure it is clear both to the person concerned, and the rest of the team, that the role is a reward for competence and hard work. This gives them incentives and makes it clear that it is possible to progress within your own company.

Please take a look at the two White Papers attached that may be of interest.

Monday, 27 Nov 2006

This post is next in the series on Writing your own Custom Reports.

The last post was all about introducing the SELECT statement, and showing how you can use it to access data.

There are a few points to draw from the previous post, and they are:

Understand your data. You can’t write any report yourself unless you have a clear idea of what data is available. The devil is all in the detail here, and if you want to create a report yourself then you have to grasp the detail.

For example, understanding that the sv_issue_basics View contains both Incident, Problem and Change data is an important detail. You find that out by reading the comments about the View, but you can also infer it by looking at the columns. Since one of the columns has is type_code, and this is I for Incident, p for Problem and c for Change it is easy to deduce this View has all three types of data, and that this column allows you to filter.

Build your query in stages. Just like I did in the earlier post, you can gradually develop your query in stages.

In this post, I’m going to refine the query further, and show you some of the other things we can do with a SELECT.

We finished with this SQL statement

select issue_no, issue_logging_date_time, issue_priority

from sv_issue_basics

where issue_logging_date_time >= ‘2006-10-01′

and issue_logging_date_time <= '2006-10-31'

and type_code = 'i'

order by issue_no

Note that the first part of the statement says what columns we want to see. There is a shorthand way of saying we want to see every column, using *, as in this example below

select *

from sv_issue_basics

where issue_logging_date_time >= ‘2006-10-01′

and issue_logging_date_time <= '2006-10-31'

and type_code = 'i'

order by issue_no

Grouping and Counting

If you try to write a report yourself, you will almost certainly want to both group by different data, and to count instances of particular groups. A lot of the reports in SerioReports do this. This post will show you how.

The Task

Produce a report for October that groups Incidents by Priority, and counts the number logged for each Priority.

What we have to do is introduce a new clause – the GROUP BY clause. This clause tells the database to ‘fold’ rows that are similar together – in this case, by Priority. Since we wish to have a count of Incidents, we can use the function specially provided in SQL for this, the COUNT function (these are called aggregate functions).

select issue_priority, count(*)

from sv_issue_basics

where issue_logging_date_time >= '2006-10-01'

and issue_logging_date_time <= '2006-10-31'

and type_code = 'i'

group by issue_priority

order by issue_priority

If you wish to GROUP results, here are a few simple rules to help you get it right. 

- If you include a column in the SELECT clause that is not an Aggregate function, you must include it in the GROUP clause. 

- Generally it is a good idea to include GROUPed columns in the ORDER clause. Using these rules, I can extend the query to group by Priority and Problem Area, as follows.

SQL query 2

 

select issue_priority, problem_area, count(*)

from sv_issue_basics

where issue_logging_date_time >= '2006-10-01'

and issue_logging_date_time <= '2006-10-31'

and type_code = 'i'

group by issue_priority, problem_area

order by issue_priority, problem_area

 

 

 

 

Figure 1 - Query being run

 

Friday, 24 Nov 2006

I intend to use the subject of reporting for a series of blog articles, covering how to create some simple reports for your own Helpdesk or Service Desk.

Rather than using a software tool such as Crystal, which I know a lot of you do not have, I will to use tools that come with both Microsoft SQL Server and Oracle to create simple SQL queries, and will illustrate how the Serio reporting Views can help you. If you understand how to use the Views, and a little bit of SQL, you will have all the skills you need to create your own reports in a reporting tool.

I will assume you have never used SQL before.

Jargon Buster

SQL: Short for Structured Query Language. This is a language used to interact with a database system. For the purposes of this tutorial it is the language we will use to specify the data we are interested in.

View: All this means is ‘virtual table’. Your database stores data (such as your Incidents or Configuration Items) in tables, a bit like a spreadsheet. Each table is made up of columns, each of which has a name, and rows which contain data. Views are created for the convenience of those writing reports. If you are still baffled by that do not worry – think of each View as a convenient way for you to generate reports.

Getting Started

You’ll need to get hold of the Serio Developer Toolkit for Crystal. You can get this from Serio if you ask – it is free. Follow the instructions in the Kit to create the Views I will use in this tutorial series. Don’t be put off, as it takes no more than 20 minutes to create the Views (even if you are a novice).

Tools for creating and running queries

Microsoft SQL Server: Use the SQL Query Analyser. You can access this from the SQL Server client tools, installable from the SQL Server CD (make sure you install the service packs available from Microsoft). Ask your DBA for assistance in set-up and installation.

Oracle: SQL*Plus. Yes I know it’s a fairly primitive tool, it is it shipped with all Oracle systems. It will be on the Oracle CD. Ask your DBA for assistance in set-up and installation.

Documentation on the Views is available with the Developer Kit. Look in the ‘Schema.xls’ document, and click on the Views worksheet.

Creating our first query

For the first query, I’m going to use the very convenient View sv_issue_basics. This View contains information about every Incident, Problem and Change ever logged in your Serio system.

The Task

Create a query that lists the reference number, date of logging and priority for each Incident logged between 1st October 2006 and 31st October 2006.

To read data from the database, we need to issue a SELECT statement. Rather than explain SELECT I’m simply going to write a statement, and then refine it.

select issue_no, issue_logging_date_time, issue_priority

from sv_issue_basics

It’s probably not a good idea to run this query just yet. We have not specified that we are only interested in data from October – so this query will return information about each Incident, Problem and Change we’ve ever logged! What we need is to specify a condition, and you do that as follows:

select issue_no, issue_logging_date_time, issue_priority

from sv_issue_basics

where issue_logging_date_time >= '2006-10-01'

and issue_logging_date_time <= '2006-10-31'

SQL query

Figure 1 - Query being executed on SQL Server 

I said that sv_issue_basics contains information about all Incidents, Problems and Changes, but we are not telling the database that we only want Incident data. Looking at the documentation for sv_issue_basics, we can see a column called ‘type_code’ that ‘equals i for an Incident, p for a Problem and C for a Change’. This looks like a perfect way to just deal with Incidents, allowing me to add a clause as follows.

select issue_no, issue_logging_date_time, issue_priority

from sv_issue_basics

where issue_logging_date_time >= '2006-10-01'

and issue_logging_date_time <= '2006-10-31'

and type_code = 'i'

If you run this query, you’ll see that it now returns just Incident data, but you’ll probably find that the data returned is unsorted – because we have not told the database to sort it before returning to us. The following statement applies a sort:

select issue_no, issue_logging_date_time, issue_priority

from sv_issue_basics

where issue_logging_date_time >= '2006-10-01'

and issue_logging_date_time <= '2006-10-31' a

nd type_code = 'i'

order by issue_no

This shows the 4 most important parts (but not the only parts) of a SQL statement: select, from, where and order.

We will continue this later, and welcome feedback on this.

Wednesday, 22 Nov 2006

Commentator Peter asks if I can expand on the subject of ‘cost per lost production hour’ in this post on Availability – which is what I intend to do in this post.

Firstly let me say I don’t consider myself an expert in this, having only had to address this once in my career. What I will do is discuss some of the main factors you I believe should consider – if you can think of others, you should add them to the comments below.

Firstly, let’s recap on what we are talking about and why it is important. The subject is the cost of lost production, or the costs of unavailability. Put simply, if you provide a Key Service X to users, and then X becomes unavailable for whatever reason, what is the cost to the enterprise for each hour that X is down? Helpdesk and Service Desk Managers are interested in this because costs accruing to the business are important and legitimate areas of reporting.

So, how do we go about deciding a cost?

Starting with the obvious, costs will be different for different systems or services. I can’t imagine a scenario where a blanket cost would be applied across all systems.

We then need to agree the costs with our key customers and project sponsors – they may have strong opinions on the costs of downtime. Getting agreement with customers is important because they will have much greater faith in the statistics if they have helped to form them.

Here are some of the factors I think you should consider. The emphasis you put on these factors will depend on the service and your own particular circumstances.

Costs of lost business

Some systems (such as sales order processing systems) are directly linked to the ability of the organisation to undertake profitable business. For systems like this, we can determine how much profit we make per hour, and use this as a cost to the enterprise for unavailability per hour. Sometimes seasonal factors may affect this, but I’d advise avoiding over-complication, and advise that you ‘take things in the round’.

Cost of lost reputation

Sometimes the enterprise can, for a period, hide or limit customer exposure to downtime by reverting to manual procedures. In other cases this is not possible, and downtime leads to customer disdain. For an enterprise that trades on its reputation, this can be disastrous. Therefore, we can sometimes estimate a cost for lost prestige in the event of downtime. As you would expect, this is always going to be a value judgement, and may be a contentious issue.

Cost of penalties

Some organisations face financial penalties in the event of downtime. We should always consider these, and it’s usually quite easy to do so as they are defined as part of a contract.

Cost of lost worker hours

We may have groups of workers who need a system to do their jobs – engineers and architects are two groups that spring to mind with CAD-type systems. For these types of workers and others like them, unavailability is the difference between contributing something meaningful to a project and standing around the water cooler talking about last night’s football. In these cases, take a view on the average number of concurrent users for the system, and then average the salary cost-per-hour of a typical worker. Then increase the salary cost per hour by about 1/3rd – this allows for holiday, office space and other costs associated with staff. Take this figure and compute a cost as follows:

Cost per hour = (Average salary cost per hour + (Average salary cost per hour*.33)) X Average Concurrent Users

If you are using Serio, use the User-based downtime reports – store your average number of concurrent users in the CMDB.

Recovery costs

In some cases, after a period of downtime (particularly an extended period) the enterprise has costs arising from recovery – paying staff to work overtime to clear backlogs, inability to follow more profitable business. Whilst this can be very difficult to determine, again take a rounded view and arrive at some easy-to-use estimates.

Monday, 20 Nov 2006

Commentator Robert asks for a ‘page where all the [availability] posts are joined together’. This post is to provide just that.

The first post is an introduction post: ‘Using Serio to obtain Availability statistics’. In short, this post:

  • Defines Availability
  • Points to the White Paper we have on the subject
  • Asks you to think about what your Key Services are
  • Prompts for how Availability data should be presented

Next comes ‘More on Availability statistics’ (not a very imaginative title I know). This post:

  • Discusses identifying what your target for availability should be
  • Asks you to think about how Key Services will be represented in the CMDB

This was followed by ‘Accessing Availability Statistics’. This post:

  • Describes how to use Service Level Agreements (SLAs) with the CMDB
  • How to log Incidents that will deliver the data we need
  • Describes what ‘ingredients’ are used to produced the final Availability graphs

Next I looked at ‘Availability & the Performance Graphs’. This post:

  • Introduced & named the main graphs we use for accessing downtime statistics
  • Looked at formulae for how downtime and availability
  • Discussed the criteria you need to supply to the Performance Graphs

The final post was ‘Availability Reporting Round-up’. This post:

  • Continued the examination of the Performance Graphs, in particular looking at the ‘User-based’ graphs
  • Discussed (briefly) assessing costs associated with lost production.

Thursday, 16 Nov 2006

This is my final post (for the time being) on the subject of Availability reporting. I’ve posted quite a bit about this recently – the previous post in the series is Availability & the Performance Graphs (this has links to the other posts).

Recall from my previous post I listed the Availability graphs, but left discussing them for this post. The graphs I mentioned were

  • Downtime – Item (Monthly)
  • Downtime – Item (Monthly User Based)
  • Downtime – Item (Weekly)
  • Downtime – Item (Weekly User Based)
  • Downtime – System
  • Downtime – System (User Based)

These reports simply show (on a weekly or monthly basis) the total amount of downtime. As you can see, there are broadly two types: those that are ‘User-based’ and those that are not user-based (I’ll call these reports ‘straight downtime‘ reports).

Starting with the straight downtime reports, these just total-up the amount of downtime over the given period. If you’ve had 4 hours downtime in August, that is what the report will show.

The User-based reports do something different. Like the straight reports, they take the amount of downtime that has occurred in the preceding period. However, it is then multiplied by the number of concurrent users for the Key Service in question (this information is taken from the CMDB).

Example: You have a warehousing system that has 30 concurrent users across three sites. This Key Service in a one-month period experiences 2 hours downtime. The User-based downtime reports would show downtime as 2 x 30 = 60 hours.

If the User-based reports sound strange, then here is the intent. You can use them to assess the costs of downtime, because the reports show the amount of lost production hours. As part of your SLA you might agree the cost of a single ‘Lost Production Hour’ for a given Key Service, and from this use the User-based reports for downtime financial reporting.

Coming to a reasonable value for a lost production hour is beyond the scope of this post, but normally it will include an averaged salary value for the users concerned, and may also include a measure for the fact that profitable enterprise has been also lost during downtime – a double whammy for the organisation.

My personal opinion is that the User-based downtime reports are the most useful – they focus attention on the effect of downtime and unavailability on the organisation. I have to comment though that sometimes there is resistance to using these reports in IT departments, because the numbers generated can be very large indeed. However, that should not preclude their use in a properly managed Service Desk or Helpdesk.

Pages