Service degradation (SFO)

This report details the incident that occurred on 5 March 2018, impacting the availability of the deployment instance creation API in the SFO region.

We sustained a small-scale Denial of Service (DoS) attack targeting one of our deployment instance creation endpoints, affecting deployment creation and unfreezing. Live deployments were not affected.

At 09:03:21 UTC our systems started to experience an abnormal amount of traffic in the deployment instance creation endpoints. A few seconds later, at 09:03:56 UTC, our on-call system reliability engineer was automatically notified by abnormal metrics alerts from our API endpoint servers.

At 09:05:34 UTC our abuse prevention system automatically disabled new deployments and unfreezes for all free and paid accounts for 10 minutes. We chose to extend this period to 30 minutes while we investigated and applied necessary measures to prevent the situation from escalating.

At 09:10:08 UTC we identified that the root cause was a DoS attack targeting instance creation. We immediately adjusted rate limiting on the endpoint and added more metrics to better alert upon suspicious or malicious activity in the future.

At 09:43:36 UTC we re-enabled instance creation endpoints for all paid accounts, and subsequently re-enabled them for free accounts at 11:29:50 UTC after a rollout and monitoring period.