Service Degradation Incident Report 2018-02-27 to 2018-03-01


Between February 27, 2018 from approximately 21:00 UTC and March 1, 2018 12:00 UTC, DeployBot faced some service issues due to a situation with GitHub. A second service degradation occurred on March 2nd, 2018, from around 15:00 UTC till 20:00 UTC. We appreciate how much you depend on DeployBot and know that the availability of our service is of utmost importance. We apologize for the impact of this incident and what follows is a description of the event. We’ll continue to incorporate what we’ve learned from this event to improve response and mitigation in the future.

The Incident

During the period of the incident, new commits in repositories would show up with delay in DeployBot. The cause turned out to be hidden very well in the lowest level of the service that handles the repository synchronization. It turned out that we got a lot of SSH read timeouts, all coming from GitHub. After some changes in the SSH connection handling, the problem vanished. We believe it was likely caused due to GitHub's removal of weak cryptographic standards, which came into effect on February 22, 2018. It is possible that the change was rolled out over a period of a few days and thus hit us with some delay.

The second service degradation showed similar symptoms on the surface. But this time the cause was a service degradation over at Bitbucket.

The Effect

As a consequence of the situation, some of your repositories might still not show the latest commits in DeployBot. If that is the case, try to reconnect the repository. You can do this on the affected repository's settings page in the section "New commits not appearing?".