The Checklist for Deploying a Scary Change
Embarking on data migrations and rolling out large-scale changes can sometimes feel like journeying into uncharted territories. These changes can pose a significant risk of destabilizing our production environment, particularly for mission-critical services. While the work can be exhilarating, it's not without its challenges, and occasionally, incidents can ensue.
After an unexpected outage of a crucial service, I realized the dire need for a more streamlined pre-deployment process for such changes. The result is a comprehensive checklist that I adhere to religiously before initiating any deployment through DeployBot.
Let’s delve into this checklist:
If you’ve read this far, you’re probably a DeployBot user and familiar with version control systems, CI/CD, and other related topics. If not, we’ve compiled several beginner’s guides: Laravel, Digital Ocean, Ruby on Rails, Docker, Craft CMS, Ghost CMS, Google Web Starter Kit, Grunt or Gulp, Slack, Python, Heroku and many more.
1. Establish a Robust Contingency Plan
Foremost, ask yourself - What is your plan if something goes wrong?
Consider the implications of a rollback deeply. If the prospect of a rollback instills a sense of dread, you're possibly not ready to deploy the change yet.
Next, determine the exact command(s) needed for a rollback. With DeployBot, rollbacks are simplified, enabling you to restore your application to a previous state with just a few clicks.
2. Identify Post-Deployment Indicators
Post-deployment, keep a keen eye on indicators that could suggest issues:
- Error Rate: Use Sentry, Rollbar, or Bugsnag to monitor error rates. An increase might indicate potential problems.
- Specific Error Reporting: Sentry provides detailed error reports, helping you identify issues swiftly.
- Latency: Lookout for unusual spikes in latency with Sentry.
- Logs: Tools like Papertrail offer aggregated logs from different sources, aiding in debugging.
- Functionality: Validate whether everything still functions as expected and if users can interact with your service satisfactorily post-deployment.
- Background Jobs: Technologies like Sidekiq and Celery help monitor job queues. An unexpected increase in retrying jobs could suggest problems.
- Customer Support: Keep track of customer complaints. Depending on the deployed change, an increase in complaints could be a key indicator of an issue.
3. External Website Monitoring
In addition to the above, using external monitoring tools like UptimeRobot or BetterUptime provides extra assurances. They not only alert you when something is off but also provide peace of mind that everything is up and running.
4. Documentation for Postmortem Analysis
Should something go wrong, documenting what happened is crucial. This record serves as a postmortem analysis and a source of shared learning for the team. Using a standard template for this helps maintain consistency across all records.
5. Automate Testing & Implement Performance Monitoring
Incorporate automated testing within your deployment pipeline. With DeployBot, you can run tests before deploying, offering an additional layer of assurance.
Consider implementing real-time performance monitoring tools to ensure the health of your application post-deployment, facilitating proactive issue identification and mitigation.
Equipped with this amplified checklist, I've been able to navigate risky deployments more confidently, leading to fewer incidents.
Do you follow a similar strategy? Or do you have insights to enrich this list further? I'd love to hear your thoughts in the comments!