An introduction to Service Levels Agreement

March 23, 2022

4 min read

An introduction to Service Levels Agreement

Building a good customer experience is essential to the retention rate of your product. But you can build as many satisfying, pixel-perfect UI snacks as you want, if your product is unreliable, it will scare your customers off. Even more so when you’re selling a product that provides value by being always “on”, like a server.

Monitoring the uptime is crucial to your product strategy and your retention rate. Customer experience satisfaction can be measured by several metrics, and they can be specific to your product, or industry, or be widely adopted by all SaaS. Service availability is part of the latter.

This article is addressed to PMs or CTOs who are looking to add a Service Level Agreement to their product. We’ll look into the definition of levels of service, what should an SLA include, and what processes should be covered in case of downtime.

What are the levels of service?

Levels of service are commitments made by a service provider to deliver one or more services within a stated tolerance.

Levels of service provide the expectations of quality and service customers can count on when signing up for an offering, they ensure the continuity of service.

A Service Level Agreement (SLA) is a binding document between the service provider and the customer which states the levels of service and the modalities when requirements are not met. Essentially, levels of service are the answer to “How frequently this product will be down?” and the SLA to “What happens when the product is down?”

Typically, levels of service are expressed using the service’s average monthly uptime. Uptime is the percentage of total possible minutes a service was available to its users during a specific period.

Slack has a 99.93% uptime for this quarter (Q1 2022). Their uptime commitment is 99.99%.

In their SLA, they commit to a 99.99% uptime, and their uptime monitoring includes the ability to log in, post a message, share a file or preview a link, not the ability to react to a message with an emoji as they’re not part of their product’s core features. They also differentiated their SLAs based on the user’s subscription plan.

Create your SLA

Decide your Service Level Indicators

To decide which services should be included in your SLA, ask yourself:

What are the core features of your product? What features your users are using constantly and are heavily relying on?
Should you have an availability commitment for beta features?
Can you monitor the uptime or performance of those services?
Are they dependent on some 3rd party service provider, package, API, or server?
What are their core components?

When you have a list of features without which your product wouldn’t work, describe what services make them work, that may be an API or a server. Those services are the ones you should be setting an uptime for, your “Service Level Indicators”.

Decide on the availability threshold for each service

Then, define what would consist of “being down”:

What amount of time would be considered as “not working”? Is it 5 minutes, 1 hour? How much time would be considered “too much” in the eyes of your users?
Is it just part of the service or the entirety of it?
Is slowness considered downtime? How much slowness can your users tolerate and how often per period?
Is maintenance considered downtime?
Is being identified as a beta considered downtime?
Is it not working because of the service itself or a 3rd party?

You now have the services that should be up at all times and know when they’ll be considered down.

Decide on the Service Level Objectives

Ultimately, you’ll have to commit to an availability rate, maturity and robustness of your product’s infrastructure have to be factored in, to make an informed decision. This availability commitment is indeed crucial to the SLAs, as they’ll be closely monitored by your users, and they’ll be the decisive power for any credits you may have to give back to users if you fall short in your commitment.

Monitor your services’ availability

Enter your SRE or DevOps department, which should be able to set up an alert when the services are down and another one when the uptime is below the threshold for the current period. When that happens, you may want to share that alert with any concerned party and if the downtime is confirmed, update your status page accordingly.

Manage service credits in case of downtime

In the SLA, you’ll have to document the reimbursement policy you want to apply when your uptime commitment isn’t met. You can either proactively apply service credit to all affected users when that happens or manually issue a service credit when it has been redeemed by an affected user.

Additionally, you’ll have to decide how much service credit you want to issue per downtime. The tech industry consensus seems to be to issue a certain amount of the monthly charges, depending on the user’s plan and the monthly uptime percentage.

Slack automatically adds service credit to affected accounts if they fall short of their uptime commitment. That service credit is equal to 10 times the amount that the customer paid during the period Slack was down.

Google, on the other hand, asks users to contact their Google technical support representative within 30 days to receive Service Credits. Atlassian users have to do it within 15 days.

Example of SLAs

Service Provider	Service Commitment	Service credit process	Service credit
Amazon API Gateway	99.95%	Automatically applied by service provider	99.0%< X < 99.95% : 10% 95.0% < X < 99.0% : 25% X <95.0% : 100%
Slack	99.99%	Automatically applied by service provider	Credit = 10 x amount paid during downtime
Google Workspace	99.99%	Customer must request Service Credit	Credit = days of service added to the end of service term 99.0%< X < 99.90% : 3 95.0% < X < 99.0% : 7 X <95.0% : 15
Atlassian Premium	99.90%	Customer must request Service Credit	99.0%< X < 99.90% : 10% 95.0% < X < 99.0% : 25% X <95.0% : 50%

Write your Service Level Agreement and inform your users

Now that you have the service you want to put availability commitments to, their levels of service, and know what’s the process your users should follow when the service has been done, you’re ready to publish them on your website or app legal pages and notify your users.

Next, we’ll review how to communicate to your users in case of force majeure and scheduled maintenance.

To recap, see this perfect illustration of what are Service Level Indicators, Objectives and Service Level Agreement: