SLA Monitoring
Although interrelated, all of these measures provide different insights into what is going wrong and they must be assessed accurately. Variations in these performance variables can be caused by malfunctions on your end, on your cloud service provider’s end, or somewhere in-between. The sooner you are able to diagnose where the problem is, the sooner it will be fixed.

At the same time, not all problems are created equally. Many SLAs specify the amount of time that can elapse before a provider is obligated to respond. In the industry this time between when the provider answers the call and the time the problem is fixed is known as the resolution time. Typically resolution times are based on the severity of the problem and ranked on 3 or 4 point scale.


To maintain a productive working relationship during times of stress and crisis, it is important not to be the boy who cried wolf and inflate response time to get results sooner. Accurate measurement has direct bearing on the level of escalation.

Once the crisis has passed and the problem has been fixed, it is important to build Root Cause Analysis (RCA) reporting into the process. This encourages customers and providers to examine the deeper underlying causes of outages with the hope of avoiding them in the future.

Time factors


Adding to the challenge of monitoring performance is the factor of time. This is not always as straightforward to calculate as it might seem. Some SLAs specify a 30-day month, some a 28-day month, others a 31-day month, and a few even a calendar month (which makes February a good month for providers). As this table above indicates, there are two more problems with determining times.
Like rust, cloud services never sleep, so expecting a single person or even a small team to manually monitor uptime is ineffective and unrealistic. Second, the amount of time, especially in the case of 99.999% uptime, is so small you could turn your head and miss a violation of the service level agreement.

The solution



Hopefully by this point you have reached the same conclusion as the authors, that manual monitoring of up and down time is not a very good practice. Fortunately there are some very good vendors of tools for measuring performance--we have included a list of some of them below.

In our experience, taking the time to research the vendors and make a good purchase decision at the beginning will pay many dividends later, especially in terms of a better working relationship with the provider and ultimately more uptime. Some of the features to look for include:


It might not be a bad idea to choose a measurement vendor during the SLA negotiation process and use the metrics they provide to define the technical terms of the SLA.

Cloud vendors are providing customers with near-real-time reporting, but it may not be enough. Measurement of the event on your end and on your cloud service provider’s end can be very different. You may also end up working with more than one cloud provider. We suggest using a neutral third party to measure performance and report the facts to both provider and customer.
List of cloud monitoring vendors

Conclusion


Having a common understanding of what the problem is, when it needs to be resolved, and the steps necessary to fix it will also avoid some of the major causes of provider and customer friction.

“Transparency builds trust,” said Mark Rivington, VP of Technology, Nimsoft which was recently acquired by CA for $350 million.

The goal of accurate measurement, as with the SLA itself, is to build the capability of the customer and the provider to respond to crises as a team. In the next issue of Cloudbook Magazine, we will explore what to do when things go badly.

About J Bruce Daley

Founder & CTO at Test Common, Inc

A recognized expert in software Bruce Daley has founded or co-founded six enterprises with very different business models - a publication (The Siebel Observer), a radio business (eCommerce Update), an event (The Enterprise Software Summit), a consulting business (Great Divide Research) an investment advisory firm (Rabbit Ears Capital Advisors) and a social network to test software (Test Common). His publications have been read in over 34 countries and he has a patent (pending) for software testing.

view the cloudbook profile for J Bruce Daley >>

About Alan Rudolph

Senior Vice President at Polycom

Alan is an expert on the economics of cloud computing and in the acquisition and integration of consulting companies. Alan Rudolph has been actively involved in the successful implementation of applications and the building of consulting practices for over 25 years. He was a Managing Director at ACS responsible for the company’s Applications Solutions Group. Prior to coming to ACS, he was director of product delivery at Corio before and after its acquisition by IBM. Prior to that, Mr. Rudolph served as COO of Planalytics, a business intelligence company, where he was recruited to reorganize the company’s sales and marketing, product development, and financial operations.

view the cloudbook profile for Alan Rudolph >>

Cloudbook Journal
Vol 1 Issue 4, 2010

This article is featured in the
Vol 1 Issue 4, 2010 of the
Cloudbook Journal

Find more Stories from this Issue >>