DeployBot in the Cloud

After years of running on our on premise server infrastructure, DeployBot has finally ascended into The Cloud. We've migrated all our services into the Amazon Web Services infrastructure and with that we made a lot of performance and reliability improvements.

As some of you know, DeployBot sprouted from Beanstalk, version control hosting created by Wildbit, basically us. We used to share all our infrastructure with Beanstalk and over the last year decoupled almost everything. However, some infrastructure decisions were carried over from Beanstalk and were less optimal than what we wanted for a new product. Our new shiny AWS setup helped us solve a lot of these old pains.

Performance!

Here's what our 7 day web transactions graph looked like before:

7 day web transactions graph
Before migration

And here is what it looks like today:

Current web transactions graph
After migration to AWS

The average response time from our application decreased from 471ms to 152ms. That is more than 3x improvement. Our users' browsers are now also able to display pages almost 30% faster. We are very happy with this result. 

We've also increased the amount of deployment servers in our cluster, so now your deployments should start processing faster than they were before as the queues are shorter. The load is also smaller on each server leading to faster deployments.

Reliability and scalability

Since our customers rely on us for their critical workflows, we took this migration as a chance to improve the reliability of our infrastructure. Among other things, here are the main things we worked on:

  • We separated our deployment servers into shards. This both improves isolation in case of failures and performance, since each shard now has more storage leading to longer persisting caches.
  • We moved to highly available database cluster, with automatic failovers, spanning 2 availability zones, with data encryption and multiple backups.
  • We moved to highly available Redis cluster, with automatic failovers, eliminating our previous Single Point of Failure.
  • We now track an insane amount of metrics about all the services we run, giving us much greater insight into all the pieces of the puzzle when something goes wrong.
  • We now have a highly available load balancer for the web application, spanning 2 availability zones.
  • We got on-the-fly disk space resizes for all machines and ability to upgrade/downgrade instances without losing any data, to make scaling easier.
  • We finally moved to the latest stable Docker version. This fixes a lot of small and big issues our previous ancient version of Docker was having.
  • Active FTP mode is no longer disabled for a server using Build Tools, thanks to our new network configuration.

We're very glad about the migration and hope that you see the improvement! Let us know what you think.