Skip to content

Monitoring SLOs

Most Application Performance Monitoring (APM) tools offer SLO dashboards out of the box; if not, API teams can create custom dashboards to visualize SLO compliance.

Datadog, shown below, has a built-in SLO feature that allows teams to set 7, 30, and 90-day windows and track one or more SLIs within them. It also calculates an error budget.

Image showing Datadog Error Budget Left report for 7 day, 30 day, and 90 days, based on a target objective for availability of 99.95%, 99.99%, and 99.9% respectively for those time periods.

Error Budgets

An error budget can help a team balance feature development and service quality. Quality should always be a fundamental requirement for new features, and feature development should not come at the expense of quality.

The Error Budget enables an understanding of how much quality has been sacrificed in a given time window, and what budget is left. This data helps the team align priorities. If the API is close to exhausting the error budget, engineering efforts should focus on improving reliability and performance while naturally decreasing the emphasis on feature development.

Conversely, having plenty of overhead in an error budget does not mean the team should de-prioritize quality and rush to implement new features. Instead, it indicates that the team has successfully managed quality alongside feature development.

Calculating Error Budgets

If tooling such as Datadog is unavailable, teams can manually calculate their error budgets.

If availability is 99.9% over a 30-day window, the error budget can be calculated as:

        Total time (minutes) = 30 days × 24 hours/day × 60 minutes/hour
        43200 minutes

        Error Budget (minutes) = 43200 minutes × 0.001
        43 minutes

In this case, the rounded error budget would be 43 minutes. Teams can then track the downtime that has already occurred in the measurement window to determine their burn rate.

Error Budget Left, as described in the example above, would be:

    Error_Budget_Left = Error_Budget - Spent_Budget