Handling Latency Outliers¶

Some endpoints within an API may be slower than others due to system complexity, data transformations, and the nature of the operation (e.g., writes, complex reads, form processing, and file uploads).

For example, the API might consist primarily of GET requests that are quickly retrieved resources, with one POST that does complicated and expensive processing depending on the search criteria given. This one expensive search could cause the latency thresholds to be exceeded for the entire API. This section explains how to handle those situations.

What would a consumer of this API endpoint want to know?¶

In this situation, put yourself in the end-user's standpoint. How long would you honestly wait for a response?

Studies have shown that an end-user will start to wonder what is happening around two seconds and around five seconds will begin to abandon the request unless given guidance on a wait time. Based on this, here are some options to handle this situation.

Guidance

Note latency behavior in the API's documentation.
Explore caching options.
Explore asynchronous processing options.

Provide documentation for the service level objectives (SLOs) this API attempts to achieve but highlight the one exception so the consumer can communicate appropriate end-user expectations when developing their applications. However, there is a limit here. See requirements on performance.

Determine if this endpoint is a candidate to use pre-loaded cache to avoid retrieving data more than once, or save off the expensive retrieval results once completed for the next request to utilize if the results do not stagnate quickly.

Determine if this endpoint should be handled asynchronously to prevent consumers from abandoning long-running requests, similar to a “take a ticket, and we’ll call you when we're done” approach.