Serio Blog

Tuesday, 05 Feb 2008

This is a continuation of last weeks post Making a Start with ITSM Reporting.

I mentioned First Time Fix Rate then as a measure of quality.

Staying with this theme of measures of quality for the Helpdesk or Service Desk, have a look at your telephone statistics. Examine how long it takes the phone to get answered, and what your call abandonment rate is at different times of day. Also check when calls are actually coming in - if there is a gap (say calls start as 8:00am but your service does not start until 9:00am it might mean you should re-examine your SLA).

Next take a look at your backlog. By backlog, I mean the number of unresolved or active tickets you have at the start of your reporting interval when compared with the number at the end - this can be looked at for Incidents, Problems or Changes depending on what processes you are actually running.

In Serio, there are lots of ways to get this but probably the easiest is from running some of the Executive Summary reports (for example, Report ES1, which you'll find attached). Aside from getting this data for your current period you can also go back 3 months, and determine if the trend is neutral, rising or falling - and then draw conclusions or take action as appropriate.

So far, if you taken all of this data, you've got a measure of quality of service from your front-facing teams, and a very broad measure of throughput through the system. Now let's look at the 'back-end' - those resolving tickets.

Staying with easily-available statistics, you can look at time-to-fix data (which is mainly focused on Incidents). This kind of data is simply a measure of how long it takes, in working time, to get from an Incident being logged to it being resolved. Again there are lots of ways to access this - in SerioReports, have a look at the SLA Analysis reports - or if you want less detailed data, you'll find it also in the Executive Summary reports I mentioned earlier. For example, you could run Report SLA5 which is a simple results-against-target analysis, or run a time distribution report like SLA12. YOu can find both reports attached.

The point of these reports it that is a measure of how much time it takes to resolve an Incident - and allows you to check that what you've agreed with your customers is what is actually being achieved. If you don't have an agreement with your customers (you have no Service Level Agreement of any kind) right now, create some targets for yourself and your team - and then measure your performance against these along with creating a Service Level Management process to go with them.

Other data that can be revealing is customer satisfaction survey data. You can ask Serio (and many other ITSM tools) to gather this for you as you resolve tickets, and provides a way to guage customer perceptions of the service you are providing.

In these posts I've tried quite hard to focus on easily available data, and to write for someone getting started. More mature ITSM environments might include this data but would probably also include Availability (see the Availability white paper), costs of downtime, Problem and Change metrics, and a more detailed SLA analysis.

Thursday, 31 Jan 2008

A colleague here asks me to write about reporting for a customer who is trying to create an IT service management report for the first time, and has little or no Serio experience - and who is not sure what data to use or where to begin.

First of all, I'll list the resources we have here on this website. Probably your first job should be to print and read our Service Desk Metrics White Paper. This white paper discusses different types of data, discusses why we write reports in the first place, and provides a sample reports template you can use.

This subject has come up before on this very blog, in these posts about metrics and KPIs (Key Performance Indicators). These posts might be of use:

Key Performance Indicators for Incident Management

Some Service Level Management Key Performance Indicators

Problem Management KPI Suggestions

Does Your Helpdesk/Service Desk Phone Just Ring Out?

There are others - search the blog for metrics and reporting.

(In case you've ever wondered, the difference between a metric and KPI is this: a metric is just a measure of something, whilst a KPI should be a measure of quality)

It doesn't matter too much if the Categories you've got set-up and are using for Incident logging are a bit of a mess. Clearly this is not ideal and needs to be rectified at some point, but it should not stop you producing a report.

So having said all that, where do you start? You will need to locate and install a copy of SerioReports, as that is where (not surprisingly) most of the ready to run reports are located. Make sure you can connect to your live system with this - you'll find instructions for how to do this in the SerioReports help.

Let's look at some measures of quality we can use.

First Time Fix Rate (FTFR). You'll find this in report AGT14, located under Agent Performance in your Report Explorer. FTFR is one (from many) measures of quality - it tells you how often, when a Customer calls with a problem, that they get an immediate resolution. If your figure is very low (for instance, less than 10%) it might indicate training or skills gaps within your Helpdesk or Service Desk team, issues with morale or motivation, or simply that the problems you deal with are of such a complex nature that FTFR will always be low.

This is where your judgement and skills as a manager will come into play - understanding why things are the way they are, and making recommendations for improvement.

Whilst you are in that area of SerioReports, have a look at who is resolving tickets by examining report AGT21 or AGT5.

I'll continue this post either tomorrow or early next week.

Tuesday, 29 Jan 2008

It's not often I'm thrilled with the idea of new technology and gadgets. As someone in their mid-forties I'm old enough to remember the Pen PC from the late 90's (sank without trace), Prestel (ditto), LED-display watches (useless and uncool) and a whole bunch of other stuff that was going to be 'mainstream' and 'big'.

So I'm slightly sceptical about new gadgets generally. My experience is that consumers are much more conservative than most PR-companies expect.

However, one thing I've seen recently has had certainly caught my attention.

It's called a Readius (there is a youtube clip here) from Polymer Vision, and features a new type of display - one that folds. I use a PDA, but one of the things that irritates me is the size of the screen - I just can't see everything I want to, particularly when using the Internet. The size of the screen is the major thing that affects portability as the screen can't bend or fold - so I'm stuck with a few square inches to squint into.

That is until the Readius. This screen  folds out so you can read it - almost like paper. It means that for a smaller device than I carry now, I can have a bigger screen - offering the promise of a usable display that will fit in my suit pocket.

Right now the fold-out display is greyscale (fine for what I want) but features a very low power consumption footprint (battery life on my current HP PDA is not brilliant).

Of course, what would make this fly off the shelves of technology dealers is Internet capability - attaching the screen to a 3G phone to make a truly portable mobile Internet device with a decent, usable screen. Alas this is where the device falls down - it simply (at the moment) does not refresh fast enough to be used in this way, although it is promised for a couple of years time. As the price for the Readius seems to be in the order of USD800, it makes it a very expensive toy until it can access websites and comes attached to a device with a browser. In the meantime, I'm still interested enough to consider buying.

It will also be interesting to see what the reliability is like - will the folding lead to cracks and expensive warranty claims?

However, improving displays puts the focus on keyboards - or lack of keyboard. They are either like arcade games (Blackberry) or much too large to carry (Pocketop Wireless). Hopefully the people at Polymer Vision will come up with a solution soon to this. [tags] gadget, readius [/tags]

Thursday, 24 Jan 2008

My post today is about something new in ITIL Version 3 – Service Requests. Actually, I say new, but Service Requests (SR) were actually in ITIL V2 (which most of you will be familiar with) but one of the welcome changes in ITIL Release 3 is to make the definition and role much clearer. An earlier V3 post is here.

A Service Request is defined thus:

A Service Request is a request from a user for advice, information, a routine change or access to some IT service.

The most obvious example of a Service Request is someone asking for a password reset - but it could be someone asking for some desktop application to be installed, or asking for login rights to some system or service. Generally, they are typified by relatively modest amounts of effort (by the Service Desk) to complete, and little risk to the business. If there is expenditure involved it's usually modest or all agreed up-front.

In the past, many companies will have handled Service Requests as special types of Incidents, or as Changes – but defining them separately gives us an opportunity to have better reporting, and in some cases to reduce administrative time.

As in all cases there are a few downsides (which I think can be safely navigated with a little planning).

  • There is the possibility of confusion between Incidents and SRs, and Changes and SRs. A little training and definition will hopefully overcome this.
  • Service Request bring into focus the need to have help with determining which systems different Customers can reasonably request access to (and what they already have access to). You 'll be pleased to hear we are adding new functionality to help with this.
  • Higher risk or more costly Changes being handled as SRs for administrative convenience - but again, with sufficient control this can be avoided.

We are changing Serio to meet this new or revised definition – some of the changes are actually quite significant and will be released as part of Serio Version 5. I'll write about these later. 

Monday, 14 Jan 2008

Back in October, I started to look at the next version of Microsoft’s server operating system – Windows Server 2008. In that post I concentrated on two of the new technologies – Server Core and Windows Server Virtualization (since renamed as Hyper-V).

For those who have installed previous versions of Windows Server, Windows Server 2008 setup will be totally new. Windows Vista users will be familiar with some of the concepts, but Windows Server takes things a step further with simplified configuration and role-based administration.

Using a technology known as Windows PE, the new setup model allows multiple builds to be stored in a single image (using the .WIM file format). Because many of these builds will share the same files, single instance storage is used to reduce the volume of disk space required, allowing six operating system versions to fit into one DVD image (with plenty of free space).

The first stage of the setup process is about collecting information. Windows Setup now asks fewer questions and instead of being spread throughout the process (anybody ever left a server installation running and then returned to find it had stopped half way through for input of some networking details?) the information is all gathered at this first stage in the process. After gathering details for the language, time and currency, keyboard, product key (which can be left and entered later), version of Windows to install, license agreement and selection of a disk on which to install the operating system (including options for disk management), Windows Setup is ready to begin the installation. Incidentally, it’s probably worth noting that SATA disk controllers have been problematic when setting up previous versions of Windows. Windows Server 2008 had no issues with the motherboard SATA controller on the Dell server that I used for my research.

After collecting information, Windows Setup moves on to the actual installation. This consists of copying files, expanding files (which took about 10 minutes on my system), installing features, installing updates, two reboots and completing installation. One final reboot brings the system up to the login screen after which Windows is installed. On my server (with a fast processor, but only 512MB of RAM) the whole process took around 20 minutes.

At this point you may be wondering where the computer name, domain name, etc. is entered. Windows Setup initially installs the server into a workgroup (called WORKGROUP) and uses an automatically generated computer name. The Administrator password must be changed at first logon, after which the desktop is prepared and loaded.

Windows Server 2003 included an HTML application called the Configure Your Server Wizard and service pack 1 added the post-setup security updates (PSSU) functionality to allow the application of updates before enabling non-essential services. In Windows Server 2008 this is enhanced with a feature called the Initial Tasks Configuration Wizard. This takes an administrator through the final steps in setup (or initial tasks in configuration):

  1. Provide computer information – configure networking, change the computer name and join a domain.
  2. Update this server – enable Automatic Updates and Windows Error Reporting, download the latest updates.
  3. Customise this server – add roles or features, enable Remote Desktop, configure Windows Firewall (now enabled by default).

Roles and Features are an important change in Windows Server 2008. The enhanced role-based administration model provides a simple approach for an administrator to install Windows components and configure the firewall to allow access in a secure manner. At release candidate 1 (RC1), Windows Server 2008 includes 17 roles (e.g. Active Directory Domain Services, DHCP Server, DNS Server, Web Server, etc.) and 35 features (e.g. failover clustering, .NET Framework 3.0, Telnet Server, Windows PowerShell).

Finally, all of the initial configuration tasks can be saved as HTML for printing, storage, or e-mailing (e.g. to a configuration management system).

Although Windows Server 2008 includes many familiar Microsoft Management Console snap-ins, it includes a new console which is intended to act as a central point of administration – Server Manager. Broken out into Roles, Features, Diagnostics (Event Viewer, Reliability and Performance, and Device Manager), Configuration (Task Scheduler, Windows Firewall with Advanced Security, Services, WMI Control and Local Users and Groups)and Storage (Windows Server Backup and Disk Management), Server Manager provides most of the information that an administrator needs – all in one place.

It’s worth noting that the Initial Tasks Configuration Wizard and Server Manager do not apply for Server Core installations. Server Manager can be used to remotely administer a computer running Server Core, or hardcore administrators can configure the server from the command line.

So that's Windows Server 2008 setup and configuration in a nutshell. Greatly simplified. More secure. Much faster.

Of course, there are options for customising Windows images and pre-defining setup options but these are beyond the scope of this article. Further information can be found elsewhere on the ‘net – I recommend starting with the Microsoft Deployment Getting Started Guide.

Windows Server 2008 will be launched on 27 February 2008. It seems unlikely that it will be available for purchase in stores at that time; however corporate users with volume license agreements should have access to the final code by then. In the meantime, it's worth checking out Microsoft's Windows Server 2008 website and the Windows Server UK User Group.

Tuesday, 08 Jan 2008

This is just a very brief follow-up to Duncan's last post, which mentions Known Errors but does not define them.

A Known Error is an output from Problem Management (or more accurately, your Problem Resolution process). For a definition of a problem, click here.

If you think of a Problem as being something you don't understand, think of a Known Error as something you do understand – even if you don't know yet how to fix it just yet.

In the case of a software bug, it would be after analysis of source code and algorithms. In the case of an infrastructure problem it is after carefully verifying the conditions necessary for repetition of the Problem and (ideally) identifying the faulty components.

Known Errors typically have two parts. The first of these is the description of the Known Error itself, showing users or product modules and versions affected. The second is the Workaround and/or Change:

  • Workaround – a way to bypass the fault you've previous described that can be used by Customers.
  • Change – to resolve the underlying Problem.

In practice, many support professionals people seem to raise Known Errors whenever any error condition or bug is proved as repeatable, rather waiting for the Problem to be diagnosed (fully understood). Also, many raise a Known Error even if a Workaround is not currently available – simply to help Incident logging staff.

Known Errors are used during the Incident logging and resolution process, as a source of Workarounds for customers, and as an information resource for Customer-facing Incident handling staff. Because of this, we've made it so that searching Known Errors is quick and painless from Incident logging in Serio Release 5.

Wednesday, 02 Jan 2008

This is an update to the earlier Release 5 post, with more information about new features.

Firstly, the new PocketSerio-i application is available for anyone that wants it on servers hosted at our offices in Livingston (in other words, there is nothing to install). PocketSerio-i allows you to action Incident and Changes (and amend the CMDB) through a web browser on a PDA. It supports any browser, so if you can browse the Internet through your PDA then you should be able to use it just fine. It's designed for the very small screen and low connection speeds that PDAs typically have.

Things you can use it for a sending emails to customers through Serio (so they look and feel just like any other support email), making re-assignments and resolving tickets.

We've also added a single-line Issue Summary field to the logging form, so that you can add a pithy description to each Incident, Problem and Change you log. You can then add this to the subject line of emails you send to customers (which helps remind them what the email is about).

One area in which there has been a lot of Change is Known Errors. Previously, customers devised their own way of recording Known Errors – usually by means of Agent Status A or B.

This will still work just fine in Release 5, but we've brought the Known Error concept more directly into the tool. There is now a Chapter (under Tools) called 'Known Errors' which lists each and every Known Error in Serio, helpfully showing the Known Error Description and Workaround together.

As part of this, we've also extended the popular Service Status HTML web pages with Known Error pages – whenever you add or remove a Known Errors, these pages are updated.

Creation and deletion of Known Errors is now done directly via Action Extensions created specifically for that purpose. When creating a Known Error, you set the Known Error Description (which is defaulted to the ticket description, but can be amended independently) and add the Workaround details – and that's it.

And one final thing, we've made it so that you can view Incidents, Problems and Changes all together in your queue if you wish.

Friday, 14 Dec 2007

Customer Billy asks for help in producing his first management report. He's read our IT metrics white paper and other KPI posts, but writes 'my problem is one of interpreting the data. I've got a lot of data and interpreting it all is doing my head in. My predecessor never did any reports'.

First of all, be clear about why you are writing the report, and who the report is for. Is it for your own benefit to help you manage the IT service more effectively, or is it to show someone else how well or otherwise IT service is managed & delivered? Be absolutely crystal clear in your own mind about this. Having asked Billy about this, the audience is primarily himself and his team.

Right now, their team has a functioning Incident Management and Asset Management process, so this is what Billy should look at first. The next six months will see them define a Problem Management process, and begin work on their Service Catalog.  Both of these will increase the options for reporting.

One of the things that the white paper does is group different types of performance data together. Some of the data is really easy to understand: Input data and Output data. Input data is primarily the raising of Incidents, and Output data is the resolution of those Incidents.

If you are reporting monthly, why not look at 3 things:

  • Incidents logged month-on-month over the past 3 months
  • Incidents resolved month-on-month over the past 3 months
  • Backlog at the end of each month

First of all, you can get all three values from report ES1 – it's an Executive Summary report you'll find in SerioReports – just put in your data range and run it (there is a lot more data there as well, which I'll ignore for now).

Billy's question was about interpretation. So, where's some things you can ask yourself:

Is the number of Incidents logged rising or falling, when looked at month on month? If it's falling then great, if it's rising then try to examine why that is.

Remember some fluctuation is to be expected ('statistical significance') but a significant jump of (say) 15% or more is something to look at more running more reports. For example, run reports by Problem Area or Department – is one particular technology or use group responsible for the increase? If so, why? (at this point, examination of individual Incident records can often be quite revealing).

When looking at the number of resolutions, is it broadly in keeping with the number of Incidents being logged, or is there a backlog developing? If there is, why do you think this is so?

Asking these questions will help you identify problems, but will also (hopefully) lead you to solutions. As an example from my own career, an increasing backlog (and worsening timeliness of resolution) at a company I once worked at was due to increasing amounts of staff sick leave. I already knew we had problems in this regard, but it was revealing nonetheless to see the results on service delivery (the IT service chart and the sick leave chart correlated).

My own reporting schedules have typically been: run some standard reports (report I always run each month) and then, from time to time, run others to 'dig a little deeper'.

In addition, you can look at timeliness of resolution from a Service Level Agreement perspective (you'll also find this on ES1, amongst many others).

Bottom line: obtain data that reflects your own IT service management disciplines, and apply some though about what this means for your IT service group.

Thursday, 06 Dec 2007

First of all, thanks to Kirstie for filling in on this blog whilst I was away on extended sick leave. Thank you for an informative set of articles on ITIL V3 Kirstie!

My own thoughts are mixed on V3 – some of which was covered in this post on ITIL Qualifications.

Whilst not welcoming everything that has happened with V3, one thing I do welcome is a clearer focus on end-user experience, especially for monitoring. One thing I do see those involved in monitoring focus on is component-based monitoring (such as routers, individual servers) rather than the actual services that they provide to customers.

Now, I'm not saying it's a good thing to monitor individual devices, but the focus, as much as possible, should be on the end user service and experience. If you are producing Availability or Downtime statistics (for examples of which, see our white paper on Availability) then the most important graphs should not be component related, but end-user related. Specifically, I mean that systems that are important to your customers should be the focus – so rather than have Availability graphs that show

Router: Trillian 99%

Server: Zephod 98%

I prefer to see

Payroll System 95%

Sales Order Processing System 99%

(Note: I prefer the second example over the first because the first does not accurately tell me what the end user experience has been. For example, it's not clear if Trillian and Zephod were down at the same time, or different times – so how much downtime did our users experience?)

Which means, in many cases, your Availability statistics are going to be more accurately mined from your Incident data (because you log all your downtime, right?).

Although Command Center users sometimes view that tool as a component-level monitoring application, they usually ignore one of it's most powerful features – it's ability to log-on to websites, interact with them (examining the responses) and logging off again. So, if you deliver web-based services to users, it can act as a thoroughly tireless user, logging on every few minutes, performing the tasks you set it, and then logging-off again – recording the results of this as it does so.

It doesn't deal with AJAX (if you don't know what AJAX is or think it refers to the son of Telamon, a mythological Greek hero, google for 'AJAX Programming'). However, for most websites it works perfectly well.

Adopting this kind of approach to monitoring goes past components, and straight towards the end-user experience.

If you want to find out more, look-up the HTTP functions in the SerioScript reference, which you'll find on the Command Center help menu. The functions are targeted at those who have some familiarity with how web applications work. A worked example is also distributed with the tool.

Wednesday, 28 Nov 2007

This is a follow-up post continuing the introduction to ITIL V3 (George Ritchie is currently away). This week I'm going to talk about Service Operation in ITIL V3.

Quite simply, Service Operation is all about delivering the services to your customers and managing the infrastructure, applications and technologies that support these services.

This can be a real balancing act – and there are a number of conflicting goals that need to be considered. Getting a balance between these conflicting priorities is paramount to successful Service Operation. These conflicting goals are:

  • The internal IT view vs. the external business view
  • Stability vs. responsiveness
  • Service quality vs. Service costs
  • Reactive vs. Proactive Management

The key is to maintain an even balance in each of these conflicts, excessive focus on either side of the scale will result in degradation of service.

The key processes in the Service Operation Phase are:

  • Event Management
  • Incident Management
  • Problem Management
  • Request Fulfilment
  • Access Management

These should all be familiar to you, however previously ITIL dealt with Request Fulfilment, Access Management and Event Management within the Incident Management Process. V3 has cleared up this rather grey area and detailed these processes separately (something I think most people will welcome). I'll discuss each of these briefly below.

Event Management – “An event is a change of state that has significance for the management of a configuration item or IT service.”

An Event tells us that something is not functioning the way it should and causes the logging of an incident. Event management is reliant on monitoring, but it is NOT the same thing as monitoring. Event management lets us know when something has gone wrong, monitoring records information even when there are no problems.

Incident Management – “An incident is an unplanned interruption to an IT service, or a reduction in the quality of an IT service. Failure of a configuration item that has not yet impacted service is also an incident.”

There is a lot of information already in the blog on Incident Management so I won't elaborate on this. Use the search function to find related articles.

Request Fulfilment – “A service request is a request from a user for information or advice, or for a standard change, or for access to an IT service.”

Service requests were a real grey area in ITIL® V2, and many organisations were unsure of whether many of their service requests should be handled as minor changes or as Incidents. ITIL® V3 has attempted to clarify this for us. The purpose of the Service Request process is to allow customers to request and receive standard services. All requests must be logged and there should be a mechanism for approval in the process.

(And as an aside, the next release of Serio is almost certainly going to reflect this change in ITIL with a direct and clear way of handling Service Requests).

Access Management – This is the process of allowing authorised customers to access services, while preventing access by unauthorised users.

Problem Management – “A problem is a cause of one or more incidents. The cause is not usually known at the time a problem record is created, and the problem management process is responsible for further investigation.”

Again, this is a well established ITIL® process. For more information, search the blog or see our Problem Management White Paper