I am going to describe SLAs in this article, when it is used and how it is done. SLA is an abbreviation for Service Level Agreement, and it simply means a guarantee of a certain level of performance of a service. An SLA is a standard part of a contract for IT services where you, as the customer, require a guarantee for certain parameters from the service provider, including response times. The service provider undertakes to pay a penalty if it does not comply with some of the SLA parameters.
In other words, an SLA is a guarantee of quality and availability of a service, where the service provider guarantees compliance with the given parameters, including a specified penalty in case of failure to comply. The primary aim of an SLA is not the penalty (usually a percentage discount from the monthly payment for the service), but rather to make sure that the services are of high quality.
When is an SLA worth it?
You will often come across SLAs in telecommunications, such as for internet or VPN services. If you buy an internet connection for your household, you have no SLA, and you do not usually need it. However, if you buy internet service for your company, when you need the internet for your business as the main work tool, you should think about obtaining an SLA and the guarantee of the quality and availability of the service. It is important to remember that the price of these two types of internet connections differs.
SLA parameters can be divided into technical and non-technical parameters. Let’s look at the individual parameters in more detail:
This is the most frequently used SLA parameter presented, for example 99.5% availability. This means that the service provider guarantees that the service will be available 99.5% of the time per month, so the maximum service outage can be only 3.65 hours per month. See the table below with the most frequently used availabilities in percentages, including what the given availability level promises, i.e. the maximum service outage duration.
|Availability||Maximum service outage duration per month|
|90% (“one nine“)||72 hours|
|99% (“two nines”)||7.2 hours|
|99.5 %||3.6 hours|
|99.9 % (“three nines”)||43.8 minutes|
|99.99 % (“four nines”)||4.3 minutes|
|99.999 % (“five nines”)||25.9 seconds|
Latency is the maximum response time on the network and is given in milliseconds (ms) or microseconds (μs). The response is usually determined on the basis of the distance in the network in kilometres, and we can also talk about latency in the ”core network” of the provider and latency “from point A to point B”. Latency – meaning delay – is an important factor in multimedia applications such as television and internet phone calls (IPTV and VoIP) and online games.
Packets are data, which are broken down, transmitted through the data network and then reassembled at their destination. Packet loss is given in increments of 0.5% or smaller units. Some applications can be very sensitive to packet loss. Again, it may be of concern to multimedia applications such as television or voice transmission, and also in various corporate systems.
Jitter is not used much; it is a variation of the delay of data delivery in a network. For example, VoIP is subject to jitter.
Mean time to recovery (MTTR)
Mean time to recovery (MTTR) is the second most frequent parameter, after availability. Here, the service provider establishes the maximum amount of time it will take to repair a failure. Mostly, providers specify from 4 to 48 hours. This means that if you have a data line or internet outage, after you inform the service provider about it, its helpdesk will track the time within which the provider is obligated to repair it. Of course, the real time of the outage (availability) is also calculated separately. The service provider either solves smaller failures remotely or sends a technician to the site.
Realisation or delivery time
You will come across this requirement in larger projects, when the customer needs a guarantee that the service will be realised within a certain time after it is ordered, e.g. within 30 work days. Besides guaranteeing the time of realisation of the service, the provider must also be able to realise a greater number of services within one order. So, the customer also often asks about the maximum volume of ordered services at once.
Response times for various requests
Here, the provider establishes response times. It may concern one-time tasks, such as order confirmation or notifying the customer about the date of initiation of the service. Or it can mean the service provider’s helpdesk (NOC) response time, when you want the provider to confirm that he received the notification of an outage, to send you regular reports and so on.
A penalty is an integral part of SLA, meaning you can apply for a percentage discount from the monthly price of the service, if the SLA is not observed.
Suppose we have an SLA stipulating 99.5% availability. The contract must include a table of penalty discounts based on the actual provided availability of the service. The table below is just an example. You must apply for the penalty to receive it.
- 99,49% – 99% …………….. discount 10%
- 99% – 95% ………………….discount 15%
- 95% – 90% ………………….discount 20%
- 90% – 80% ………………….discount 25%
- Lower than 80% ………………discount 50%
To be able to claim a penalty, you need to have it in your contract, or just place here a link to the document, which includes the penalty conditions. The provision of the service includes service parameters, which can often be covered by an SLA. Other matters, such as failure notification, the definition of the penalty amount and possibility of withdrawing from the contract upon notice, are usually covered in documents, which the contract just refers to. This is especially true in telecommunications, where contracts refer to general conditions, rules of complaints and rules of operation, service description, etc. If it is a larger project, you can negotiate individual conditions of the SLA, including particular points which can be directly included in the contract.
To be able to claim penalties, it must be possible to measure the shortfall. Here you will often have to rely on the service provider, who has an application which collects statistics of the operation of the service. It mostly concerns technical parameters, which can be filtered according to time interval.
The repair time for a failure is calculated from the point you notify the helpdesk about the service outage (SLA parameter availability). If the repair time exceeds the maximum outage time according to the availability in the table, you can claim a penalty per the SLA. However, it is always necessary to notify the helpdesk about the incident or report it as otherwise agreed. If you claim that the service was not working for an entire day, but you did not report the outage, you cannot claim the penalty.
Frequent SLA combinations
You can often come across SLAs based on service availability. In such a case, the availability percentages are specified and nothing else is addressed. This applies for a wide spectrum of services, not only in IT but also in web services. In telecommunications, you will most often come across SLAs which address availability and MTTR that is the mean time to recovery. In major cases, such as a large corporate data network, which the business of the company depends on, you will come across complex SLAs, which often address multiple SLA parameters simultaneously.
Ambiguities and nonsenses
- An SLA without penalty – If someone offers you an SLA, then they must provide a document with both the parameters and penalties clearly defined. Ask for the SLA specifications!
- Mistaking an SLA for availability – Some people mistake an SLA for availability. If someone promises you service availability, but does not offer any information about penalties, it is no SLA.
- Offering a stringent SLA without providing the necessary technical details – If someone promises you a high level of availability, I recommend you verify the provider can ensure such service. For example, if you required 99.9% internet service availability, the provider would have to prove to you that they have main and backup internet connectivity and two routers. In such a case, the SLA would be sufficiently supported by their resources. Such a demanding SLA costs more in comparison with the basic type of SLA.
- An SLA promising 100% availability – There really is no such thing as 100% availability; even 99.9% availability (“three nines”) is doubtful, as it is a very high level of service, requiring a sophisticated backup solution.
- An MTTR within 4 hours in the entire country, but the company has its engineers only in headquarters – If someone promises a maximum repair time of 4 hours, and you operate around the entire country, such a provider should have more staff in more places, often in regional capitals.
I hope that my description of SLAs helps you to understand this issue and make you more careful when someone promises you a high level of service or penalties. If you have questions regarding this topic, I will be glad to answer them in the comments section.
Latest posts by Radek Kucera (see all)
- How Does the Internet Work? - 2016-08-19
- Tutorial: How to redirect domain.com to blogger.com - 2016-02-24
- How to create your own web pages - 2015-09-25