Deploy
Code goes through several steps to get to production. This document describes this process. It should also be noted that this is the same flow for both content and code changes.
Table of Contents
- Automated deployment schedule
- Overview
- Automated process details
- Rollbacks
- Creating a vets-website Release
- Hotfixes
- vets-website Manual deployment
- content-build Manual deployment
- Dealing with Flaky Unit Tests
Automated deployment schedule
All listed times are eastern timezone and are scheduled Monday–Friday (excluding holidays). All deployments are executed from the main/master branch in each repository.
Application | Changes in by | Deployment Start | Release Information | Jenkins Job |
---|---|---|---|---|
vets-website | 2:00pm ET | 3:00pm ET | vets-website release history | vets-gov-autodeploy-vets-website |
content-build | 2:00pm ET | 3:00pm ET | content-build release history | vets-gov-autodeploy-content-build |
vets-api | 2:00pm ET | 3:00pm ET | vets-api release history | vets-gov-autodeploy-vets-api |
content-build (content only) | n/a | 9am–12pm Hourly, 1:45pm, 4pm, 5pm ET | content-build latest release | vets-gov-autodeploy-content-build |
Overview
Jenkins performs the following tasks after a pull request is merged into main/master
- Build
main/master
branch to create an deployment artifact (.tar file) - Deploy to development and staging using deployment artifact
- Create a release in GitHub from main/master, tag artifacts of that commit SHA hash with release name
- Deploy to production using deployment artifact according to automated deployment schedule
A big assumption in this process is that the main/master
should always be deployable. As such, the deployment to the staging environment is configured to happen automatically and can be used to see what something would look like in a production-like environment for any kind of manual testing/verification.
Automated process details
- Every work day at the configured time the vets-gov-autodeploy-vets-website and vets-gov-autodeploy-content-build jobs will run.
- The autodeploy jobs will call the vets-website release and content-build release jobs which create a vets-website release and content-build release.
- Release artifacts are deployed to production by the vets-website-vagovprod and content-build-vagovprod jobs. These jobs should not be triggered manually.
Rollbacks
If a production deployment introduces issues that affect Service Level Objectives (SLOs) established for the project, you may restore service to users by rolling back to a previous deployment. This is accomplished by triggering a new deploy job in Jenkins using a previous release tag. Typical deployment times are under 5mins.
- Identify the release you want to rollback to by visiting the vets-website or content-build release log(s)
- Click on the commit ID in the left column of the release you want to reference
- Copy the commit ref (it will be a long string like:
7c74702605561a33a5a6edbe46a95ac43dddb1df
) - Visit the vets-website or content-build prod deploy job(s)
- Enter the ref value into the ref field
- Click "Build"
If SLOs are not affected and a fix is not critical, no rollback will be issued. Instead the fix should be applied through the standard development workflow.
Creating a vets-website Release
If the commit you are trying to release to does not have an official release tag, you have to create one:
- Update your local main branch
- Check out the commit you want
- Note the latest release from the vets-website release log
- Visit the release job in Jenkins
- Make sure the commit you want to use has passed through the build pipeline in main
- Replace the "ref" value with the commit you want to use to create the release (it will be a long string like:
7c74702605561a33a5a6edbe46a95ac43dddb1df
) - Click "Build"
- You should now see it in the vets-website release log and can follow the normal rollback steps.
This should create a new release, and deploy it to va.gov.
Creating a content-build Release
If the commit you are trying to release to does not have an official release tag, you have to create one:
- Update your local main branch
- Check out the commit you want
- Note the latest release from the content-build release log
- Visit the release job in Jenkins
- Make sure the commit you want to use has passed through the build pipeline in main
- Replace the "ref" value with the commit you want to use to create the release (it will be a long string like:
7c74702605561a33a5a6edbe46a95ac43dddb1df
) - Click "Build"
- You should now see it in the content-build release log and can follow the normal rollback steps.
This should create a new release, and deploy it to va.gov.
Note: Verify that there are no scheduled content releases around the time of creating a release. A following release can override your manual release if started before your release has finished.
Hotfixes
The use of hotfixes is discouraged, but may be useful in an emergency situation when master
has significantly deviated from the release and a fix to the failed production release is critical.
To create a hotfix, create a branch from the last stable release tag, make changes necessary (with review),
create a new release tag following the correct naming scheme, and trigger a deploy in Jenkins with the
release name as a parameter. This documentation is above, in the "Creating a Release" section.
vets-website Manual deployment
Out-of-band deploys may be performed in accordance with Platform deployment policy.
Before deploying
- Wait for Jenkins to build the change in
vets-website
- Builds status can be viewed here. Requires SOCKS proxy. See Accessing internal tools
- If this build fails, you may need to log into Jenkins and restart it
Full production deploy of vets-website
- Verify that your changes are committed and that the changes since the last deploy are safe to deploy:
You may need to contact the developers of those commits to verify.
- Start a deploy job
- Log into Jenkins here
- Click Build with Parameters (contact #vsp-operations if you don't see this option and think you should)
- Set the release_wait option to 5 minutes
- Uncheck use_latest_release <-- important
- Click Build
- Verify commits in deployment notification
In the #vfs-platform-builds Slack channel, Jenkins will include a link that shows the exact commits being released in the deploy notification.
- Verify changes in production once the build finishes
Manual deployment of vets-website to staging or dev
When staging deployments get clogged up or staging as a whole falls behind production (for various reasons) you may need to execute a manual deployment for staging. To do this use the following steps:
- Visit the vets-website vagovstaging job in Jenkins (or vets-website vagovdev for dev)
- Click Build with Parameters
- Make sure the commit you want to use has passed through the build pipeline in main
- Replace the "ref" value with the commit you want to use to create the release (it will be a long string like:
7c74702605561a33a5a6edbe46a95ac43dddb1df
) - Click Build
- You can watch the deployment process from the vets-website vagovstaging (or vets-website vagovdev for dev) status page in Jenkins
- Confirm that your deployed commit is on staging
content-build Manual deployment
Out-of-band deploys may be performed in accordance with Platform deployment policy.
Multiple manual deploys are supported in Jenkins:
- Partial deploy including only static page changes (
vagov-content
andDrupal
) - Full deploy of VA.gov static pages
Content-only production deploy
- Start a deploy job
- Login to Jenkins
- Click Build with Parameters
- Set the release_wait option to 5 minutes
- Check use_latest_release <-- important
- Click Build
- Verify commits in deployment notification
In the #vfs-platform-builds Slack channel, Jenkins will include a link that shows the exact commits being released in the deploy notification.
Full production deploy of content-build
- Verify that your changes are committed and that the changes since the last deploy are safe to deploy:
You may need to contact the developers of those commits to verify.
- Start a deploy job
- Login to Jenkins
- Click Build with Parameters (contact #vsp-operations if you don't see this option and think you should)
- Set the release_wait option to 5 minutes
- Uncheck use_latest_release <-- important
- Click Build
- Verify commits in deployment notification
Manual deployment of content-build to staging or dev
When staging deployments get clogged up or staging as a whole falls behind production (for various reasons) you may need to execute a manual deployment for staging. To do this use the following steps:
- Visit the content-build vagovstaging job in Jenkins (or content-build vagovdev)
- Click Build with Parameters
- Make sure the commit you want to use has passed through the build pipeline in main
- Replace the "ref" value with the commit you want to use to create the release (it will be a long string like:
7c74702605561a33a5a6edbe46a95ac43dddb1df
) - Click Build
- You can watch the deployment process from the content-build vagovstaging (or content-build vagovdev) status page in Jenkins
Dealing with Flaky Unit Tests
A test fixture is a fixed state so the results should be repeatable. A flaky test is a test which could fail or pass for the same configuration. In monitoring the deploy of vets-website we often have to deal with flaky tests in a few specific situations:
- A flaky test inside of a pull request
- A flaky test in
master
when an autodeploy is not nearing - A flaky test in
master
when an auto-deploy is nearing
To tell if an auto-deploy is nearing you can refer to the table at the top of this document.
A flaky test inside of a pull request
If a unit test fails in a pull request, no one is alerted so it’s more likely that it gets refreshed to unblock the work or skipped in the PR, then reviewed by the code owner. This action is the responsibility of the pull request owner and has no effect on the daily deploy.
A flaky test in main when an autodeploy is not nearing
If a unit test fails in main and a deploy is not nearing (or has already happened for the day), the failure can be ignored as inconsequential. However, the pipeline should still be refreshed in order to tell if the test is flaky or legitamately failing. The relevant code owner should then be alerted so they can either skip or fix the test before the next deploy (at the discretion of the test owner).
A flaky test in master when an auto-deploy is nearing
If a unit test fails in main
and a scheduled deploy is nearing, va frontend cop support
team member should
refresh the pipeline immediately, open up a pull request to skip the test, and alert the code owner
for a fix and/or pull request approval to skip the test. Ideally the test gets fixed, but in reality,
the process to merge can often take longer than is allowed for by the timing of the deploy. This is
why it is important to have a pull request opened immediately to skip the test if needed - no need
to wait for the code owner, delays can fail the deploy. This is the most common reason for a failed
deploy so we should all be on high alert for it while on a support rotation.
As the pull request is running through the pipeline, the support engineer should keep refreshing the main pipeline just in case it catches and is successful to prevent a failed deploy. Even if the deploy is successful, the test should be either fixed or skipped as to not block future deploys.