The biggest challenge in calculating availability is in gathering all the necessary service time values. Stuart Rance is a consultant, trainer, and author with an international reputation as an expert in ITSM and information security. One of my customers has "How quickly you responded to my ad-hoc change requests" as a KPI, because they are in an ever changing business and that is what matters to them. Processes of ITSM Availability … And use availability information to improve the process. CSF describes what has to be achieved (if we want to say that something is successful) and KPI measures it (i.e. Stuart is an examiner for ITIL, chief examiner for RESILIA, and an instructor for ITIL, CISSP and many other topics. And be precise. To do this: In this example, you would calculate the availability as: You can use this same technique to calculate the impact of lost availability of IP telephony in a call center in terms of PotentialAgentPhoneMinutes and LostAgentPhoneMinutes; and for applications that deal with transactions or manufacturing you can use a similar approach to quantify the business impact of an incident. Stuart, thanks for the article. support the achievement of agreed availability goals. How to Create Better SLAs and ITSM Outcomes, DevOps and ITSM – the Perception of Different, IT Support’s Required Evolution to Empower Employees in 2021, SLM - Service Level Management | Gerenciamento de Nível de Serviço, SLM - Service Level Management | Diferenças entre SLM e SLA. This approach involves instrumenting all the components required to deliver the service and calculating service availability based on understanding how each component contributes to the end-to-end service. Here Comes Self-Servicing, The Importance of Being Earnest (at Work): Why Employee Experience Matters. Availability Management ensures the operative services meet all agreed availability goals. However, it can lead to disputes about the accuracy of the availability data. Answers to questions like these can have a big effect on your perceived availability and help you to avoid the watermelon effect. Here’s an example of how you could measure and document availability to reflect the fact that the impact of downtime varies: If you use a table like this when you’re discussing the frequency and duration of downtime with your customers, the numbers are likely to be much more useful than percentage availability, and they’ll certainly be more meaningful to your customers. The process overview of ITIL Availability Management (.JPG)shows the key information flows (see Fig. No percentages to play around with. Does anyone agree measuring Availability for a transaction processing service on the basis of failed v successful transactions is a valid approach? You compare the number of transactions that would have been expected without downtime to the number of actual transactions, or the amount of production that you would have predicted to the actual production. Firstly, you need to understand the context. Joe also provides consulting services for IBM i shops, Data Centers, and Help Desks. A compliance metric is often expressed in terms of a Boolean pass/fail or yes/no result. Be aware—this assumption can lead to the “watermelon effect”, where a service provider is meeting the goal of the measurement, while failing to support the customer’s preferred outcomes. This approach is generally fairly inexpensive. You can use this data to identify the duration of incidents and the number of users impacted. In this blog, I’ve offered a number of suggestions for how you can measure and report availability, but I haven’t discussed what you can do to help manage and improve availability. For example: It’s essential to measure and report availability in terms that can be compared to targets that have been agreed with customers and that are based on a shared understanding of what the customer’s availability needs actually are. The problem here is that we are approaching the measurement of services to often just using availability as the primary indicator of performance. Service availability: the amount of time the service is available for use. It’s the same with IT services being consumed by business customers. Be sure you can break down and look at how long each individual outage was (duration) and how often an outage occurs (frequency). If you do not think availability tracking is important, ask your executives if they would like to have their online store unavailable for 3.65 days each year. Here are five questions you should consider asking: Most IT services support several business processes, some of these are critical and others are less important. I’ve already mentioned that percentage availability may not tell you enough to be of value. Use tools and methods that get the information you need. Investigate why your outages happened. The ITIL Availability Management process works jointly with Capacity Management, Service Level Management, and IT … The two people who were impacted where the CEO and one of his reports. Compared to ITIL 4 availability management practice guidance, as documented in the ITIL 4 Foundation publication: Availability metrics are publicly stated in terms of uptime and durability and leading … On the other hand, a web-based shopping site may not be impacted by a one-minute outage, but after two hours the loss of customers could be significant. You are absolutely right, but it is not just availability and performance that matter. You have had 30 minutes of downtime this week. IT processes. During non peak hours if any two of the cash registers are down, we will consider the entire retail service is down, SLA breach is incurred and downtime is accumulated. Some of the more common ways that availability data can be collected include: Service availability is a simple idea, but the difficulty is in the details. The point being that there are various cut off times throughout the day and if cut off is missed then the service isn’t being provided. But what exactly should you be talking to your customers about? Your email address will not be published. Following the introduction of Design Coordination in ITIL 2011 the information flows have been adapted. a well-defined unit. When this is done well it can not only collect end-to-end availability data, but it can also identify exactly where a failure has occurred, helping to improve availability by reducing the time needed to resolve incidents. Availability should not purely react to service and component failure. From core to cloud to edge, BMC delivers the software and services that enable nearly 10,000 global customers, including 84% of the Forbes Global 100, to thrive in their ongoing evolution to an Autonomous Digital Enterprise. Required fields are marked *. Typically, this will include code in the client application as well as on the servers. If the business goal is to enter and process orders while the business is open, it will dilute your measurements to factor in uptime during off-hours, weekends, and holidays. The purpose of Service Level Management (SLM) is to ensure that the service targets are created, negotiated, agreed, documented, monitored, reviewed and reported to the customer.SLM acts like a liaison between the customer and the service … Unfortunately, this sometimes means that IT service organizations focus on the percentage measure and lose sight of their true goal – providing value for customers. For example, the new service has undergone operational acceptance testing or measurement of tasks against a burn down chart. For example, if CSF says that Service Desk efficiency has to be increased as part of a customer service improvement program, KPI would be t… IT resources. The consumers of services want things like transaction throughput and responsiveness/speed at the times they need to use the service. Just by simple blind luck, our SLA reported was much better than what I would have reported it as – which is 5 minutes of outage PERIOD. Sadly, many IT organizations focus on the numbers in an SLA, and completely fail to meet their customers’ needs – even if they deliver the agreed numbers. © Copyright Quick Content Limited. What is the difference? Do not be content to just report on availability, duration, and frequency. Value is created when every customer transaction is quickly processed, thereby avoiding lengthy queue at cash register, and therefore prevents money walking out the door. Your metric should be clearly understood and related to the critical business processes being measured. In addition to his day job, he is also an Associate Consultant. says if CSFs are achieved). Tools that support this data collection often report service performance, as well as availability, and this can be a useful addition. The numbers in the SLA are simply agreed ways of measuring, the real goal is to deliver services that meet your customers’ needs. Which of your business functions are so critical that protecting them from downtime is a priority? Percentage of Incidents Resolved by First Level Support. One aspect of availability measurement and reporting that’s often overlooked is planned downtime. 1.3. According to ITIL®, availability refers to the ability of a configuration item or IT service to perform its agreed function when required. Also, for the following examples… How will you document and report your findings? Go beyond simple availability to report on the frequency and duration of your downtime. When did it happen? Good article Stuart, thanks for sharing. This is why vendors sell products with five nines availability, and customers want SLAs where their services are guaranteed 99.999% uptime. ITIL key performance indicators (KPIs) are a measure of performance that enables organizations to obtain information about many relevant factors such as the effectiveness and efficiency of their … In general, a metric is a scale of measurement defined in terms of a standard, i.e. For example, let’s consider an IT organization that has agreed a 24×7 service and an availability of 99%. ©Copyright 2005-2020 BMC Software, Inc. Benjamin, it certainly is a challenge. This method can also miss the impact of shared components, for example one of my customers had regular downtime for their email service due to unreliable DHCP servers in their HQ, but the IT organization did not register this as email downtime. Secondly you need to think very carefully about a range of practical issues: what will you measure, how will you collect your data, and how will you document and report your findings. This does actually measure end-to-end availability. It could be said that the construct of SLAs is, fundamentally, the reason IT departments are not perceived as innovative and strategic. Was it one time of 30 minutes when a technician accidentally downed a router, or was it 10 times of three minutes each where no one knows what happened?

