[Quickbase US Status] Notice: Service Functionality Degraded (Unplanned)
[Quickbase US Status] Notice: Service Functionality Degraded (Unplanned)
08/02/2023, 04:00am EDT
08/02/2023, 04:00am EDT
Starting around 4:00 AM Eastern US Time, we are seeing performance degradation to the Quickbase Sync service. We are investigating this issue and will provide an update as soon as we have one.
08/02/2023, 07:41am EDT
08/02/2023, 07:41am EDT
We are still investigating and will provide an update in 30 minutes.
08/02/2023, 08:11am EDT
08/02/2023, 08:11am EDT
We are still investigating and will provide an update in 30 minutes.
08/02/2023, 08:43am EDT
08/02/2023, 08:43am EDT
We are still investigating and will provide an update in 30 minutes.
08/02/2023, 09:24am EDT
08/02/2023, 09:24am EDT
We believe we have identified the issue and our team is working towards a resolution. We will provide an update in 30 minutes.
08/02/2023, 10:08am EDT
08/02/2023, 10:08am EDT
We believe we have identified the issue and our team is working towards a resolution. We will provide an update in 30 minutes.
08/02/2023, 10:35am EDT
08/02/2023, 10:35am EDT
Our team is working towards a resolution. We will provide an update in 30 minutes.
08/02/2023, 11:08am EDT
08/02/2023, 11:08am EDT
The service is starting to recover and our team is monitoring. We will provide an update in 30 minutes.
08/02/2023, 11:10am EDT
08/02/2023, 11:10am EDT
As of 11:10 AM Eastern US Time, connected table refreshes are working normally.
This incident is closed.
08/02/2023, 12:20pm EDT
08/02/2023, 12:20pm EDT
First, we realize we are an important part of your business, and you expect Quickbase to be available and performing well when you need it. We apologize for the disruption this incident caused our customers.
The preliminary root cause is that a minor database upgrade initiated by our hosting partner resulted in a configuration error that prevented correct use of that database. These minor database upgrades are considered routine and occur periodically without interruption to connected table refreshes. Today's incident was unexpected and we will be investigating it fully to determine what happened, how we can prevent it in the future, and how we can respond more quickly. Please read on for more details. Please note these are preliminary findings and we will be updating this root cause as we learn more.
The Quickbase Sync service is responsible for refreshing connected tables in Quickbase applications. For example, if you are keeping data between two Quickbase apps in sync, or keeping data in a Quickbase app in sync with some other vendor such as Salesforce or Google Mail. The Sync service consists of many services and databases. One of the databases used by the service was upgraded at about 4:00 AM Eastern US Time by our hosting partner. Our hosting partner does not typically inform us of exactly when the minor database upgrades will be performed because they are considered routine.
At some point during the minor database upgrade, configuration information in the database was changed unexpectedly by the upgrade program resulting in inconsistent ability to access the database. At that point, some connected table refreshes completed successfully while others failed with an error that would display in the connected table history tab, or in the e-mail customers receive when a connected table refresh fails. The error was similar to the following:
The error was: The resource ‘[[connection_XXXXX]].[[XXXXXXXXXXXXXXXXXXXXXXXXXX]]:serviceDefinition::Query’ does not exist. Tracker ID: [XXXXXXXXXXX]
As the incident progressed, our team was attempting to recover the Sync service so customers may have seen an HTTP 502 error.
Once we identified that the database was only intermittently working, we rebuilt it and the Sync service returned to normal.
We are continuing to investigate why the configuration information was changed by the database upgrade program.
As with all incidents, we are building a list of areas in which we can improve. This includes how fast we detected the issue, and how fast we responded. We received monitoring alerts via e-mail within 5 minutes of the incident starting but none of the alerts paged our on call staff. We are actively working to improve our monitoring and alerting for the Sync service and expect to have pageable alerts in place today that will notify our on call staff of incidents such as the one that occurred today.
As previously noted, we will update this root cause further as we learn more.