Measurement Windows¶
Service Level Indicators (SLIs) are less interesting if looking at them individually. They are more interesting as trends and patterns over time. “Over time” means defining a measurement window. Using availability as the example SLI, it will be measured within a specific time window. For example, the baseline example SLO is:
Within a rolling 30-day period, APIs must maintain 99.9% availability in production and 99.0% in test environments, with no exclusions for planned downtime or holidays.
To achieve this SLO, it is recommended to set up two additional measurement windows for short and long-term tracking. Below are the three recommended windows.
- 7-day window: Set a shorter-length window, such as a 7-day window, for proactive monitoring with a stricter alert setting. For example, if the SLO targets 99.9% availability, set this window at 99.95% or greater.
- 30-day window: This window tracks compliance with the SLO and is the period used within the Service Level Objective.
- 90-day window: Set a longer-length window to track trends. Providing a wider view of the service's performance and reliability over time allows teams to assess the effectiveness of past improvements and lean into or ease off reliability engineering.
Guidance
- Rolling time windows instead of calendar windows more closely align with the consumer's experience and allows API teams to identify and focus on problems sooner.
VA recommends the use of rolling time periods for the measurement window, such as 7-day, 30-day, and 90-day rolling windows. A rolling window is more closely aligned with the consumer’s experience and reports recent activity as it happens. Compare this to a calendar window in which the results being measured are not available until the end of the calendar period. Rolling windows allows teams to see trends early and to pivot to focus on the problems sooner.
Weekend Variability¶
Using a common 7-day, 30-day, and 90-day rolling window measuring system can simplify reporting requirements within VA. However, it introduces weekend variability which could skew results for some APIs. If your API is subject to weekend variability, change the rolling window alignment to avoid the potential of extra weekends by using 7-day (1-week), 28-day (4-week), and 84-day (12-week) rolling windows. Using 7-day increments when creating the measurement window keeps extra weekends out of the calculation, thus eliminating potential variability.