Postmortem for September 14th, 2023; Workbooks not loading for Sigma users on AWS Snowflake

Summary

On September 14th, 2023, at approximately 12:49 UTC, Sigma users on Snowflake hosted in AWS US-West-2 were unable to access workbooks, experiencing instead an endless loading loop without erroring-out.

Our initial investigation in logs and error rates revealed that the issue was triggered by a backend optimization that was gated by a feature flag, which we reverted to return to a normal, healthy state.

Incident Start Time: Approximately 12:49 UTC September 14, 2023
Incident End Time: Approximately 14:39 UTC September 14, 2023

Root Cause:

Workbook query results were not reaching our backend services due to network issues caused by a backend optimization; this resulted in workbooks stuck in a loading state.

Timeline (UTC)

  • 2023-09-14 12:49 UTC First Indication of error
  • 2023-09-14 14:22 UTC Customers reported the incident
  • 2023-09-14 14:27 UTC Escalated to highest priority
  • 2023-09-14 14:35 UTC Mitigation measures were implemented by reverting the backend optimization
  • 2023-09-14 14:39 UTC Verified that workbooks returned to a stable and healthy state

Future Corrective Actions

  • The backend optimization that was reverted to mitigate the issue will not be enabled again, as it is no longer deemed necessary.
  • Added additional logs and monitoring to more quickly detect this condition of endless workbook loading.

We apologize for the disruption and inconvenience that you have endured as a result of this situation. Your trust in us is of the utmost importance, and we are dedicated to taking the necessary measures to prevent similar incidents from occurring in the future. If you have any questions please reach out to our Support Team.

Thank you for your continuous support, patience and understanding.