Performance & Availability¶

Government digital services have a history of challenges with performance and reliability. This issue often stems from the complexities of integrating multiple services, including legacy systems, and letting their limitations affect the end user experience. The primary goal of your API should be to simplify this complexity, handle any errors from upstream services, and enhance performance.

With appropriate strategies, your API can achieve high performance and reliability, comparable to newly developed projects. For APIs being built from scratch, it's crucial to prioritize performance and reliability from the outset, as APIs inherently serve as dependencies for other applications.

Performance¶

Requirement

APIs must respond to requests within 10 seconds, including any upstream calls, or return a 504 error, signaling a server-side timeout.

Guidance

APIs should aim for response times under 1 second.
If an API cannot respond within 10 seconds, it should use an asynchronous pattern that provides an immediate response and processes the request in the background.

While the maximum allowable response time is ten seconds, it's recommended that APIs strive for response times of under one second, and ideally, in just a few milliseconds.

Availability¶

Guidance

APIs should aim for 99.9% availability in production environments.
APIs should aim for 99.0% availability in testing environments.

APIs are expected to maintain 99.9% availability in production and 99.0% in testing environments. Assistance is available for overcoming technical or policy challenges to achieve these goals.

See service level objectives under monitoring and blue-green deployments under production management for further information on how to achieve high availability.

Managing downtime in dependencies¶

Requirement

APIs must not pass upstream errors to consumers.

VA APIs often rely on various external services with differing reliability levels. While API teams may not always influence changes in these services, it's critical that APIs do not expose consumers to unexpected errors (500) or service unavailability issues (503, 504).

APIs should handle unexpected errors, downtimes, and timeouts from upstream services with clear and consistent messaging. This approach ensures that consumers are aware of the issue's source and can appropriately inform their users.

For advice on how to map upstream errors, refer to Choosing an error code.

Maintenance¶

APIs are expected to meet these service level objectives (SLOs) even during scheduled maintenance. Ensuring uninterrupted service during updates or other maintenance activities is a hallmark of a robust architecture and deployment strategy.