[Quickbase US Status] Notice: Service Not Available (Unplanned) - Down Time

Incident
06/09/2023, 12:30am EDT

[Quickbase US Status] Notice: Service Not Available (Unplanned) - Down Time

Status: closed
Start: 06/08/2023, 02:35pm EDT
End: 06/08/2023, 03:55pm EDT
Duration: 1 hour 20 minutes
Affected Components:
Quickbase Service - US Region Quickbase Audit Logs - US Region Quickbase Automations - US Region Quickbase Billing - US Region Quickbase Pipelines - US Region Quickbase Platform Analytics - US Region Quickbase RESTful APIs - US Region Quickbase Sync - US Region Quickbase Webhooks - US Region
Update

06/08/2023, 02:35pm EDT

06/08/2023, 02:35pm EDT

Starting around 2:35 PM Eastern US Time, we are seeing performance degradation to the platform.  We are investigating this issue and will provide an update as soon as we have one.

Update

06/08/2023, 03:02pm EDT

06/08/2023, 03:02pm EDT

We continue to investigate this issue.  We will provide another update in the next 30 minutes.

Update

06/08/2023, 03:29pm EDT

06/08/2023, 03:29pm EDT

We are still investigating this issue with the highest urgency.  We will provide another update in 30 minutes.

Update

06/08/2023, 03:35pm EDT

06/08/2023, 03:35pm EDT

Starting around 2:35 PM Eastern US Time, we are seeing performance degradation to the platform. We are investigating this issue and will provide an update within 30 minutes.

Update

06/08/2023, 03:54pm EDT

06/08/2023, 03:54pm EDT

We are seeing improvements in performance to the Quickbase platform.  We continue to monitor this issue and will provide another update in 30 minutes.

Resolved

06/08/2023, 03:55pm EDT

06/08/2023, 03:55pm EDT

As of 3:55 PM Eastern US Time, the Quickbase platform returned to normal performance levels.  (Note, you are likely receiving this e-mail close to 4:30 PM Eastern US Time.)

This incident is closed.

Root Cause

06/09/2023, 12:30am EDT

06/09/2023, 12:30am EDT

All times shown are Eastern US Time.

Between 2:35 PM and 3:55 PM, the performance of the Quickbase US platform was degraded with many requests timing out or returning an error.  Between 2:35 PM and 3:20 PM, the customer experience may have been normal for some customers while others had degraded performance.  After 3:20 PM, most customers likely had a poor user experience.  This incident was resolved by 3:55 PM.

The preliminary root cause is that a platform service that checks which platform features a user is eligible to access received a large influx of requests that resulted in queuing of requests.  At this point in the incident, performance was slow but most requests would ultimately complete.  This queuing in turn caused a platform service that routes requests to eventually exhaust its available resources which is when the platform performance further degraded.  We are still evaluating the cause of the large influx of initial requests that triggered the problem but believe it originated with unexpected behavior of how pipelines interacted with the feature status service.

We've implemented two improvements in our platform monitoring that will provide us with faster identification of the specific area of the platform where this issue occurred.  We are also researching methods of improving the scalability of the feature status service and the routing service.

We will continue to update this root cause as we learn more from our investigation of this incident.