Postmortem for March 21, 2023 Incident - Sigma AWS & Sigma GCP

Summary

On 3/20/2023, Sigma users hosted on AWS or GCP experienced time out errors on newly created input tables in workbooks that were duplicated or saved on an exploration. This was a result of a change to add more metadata to the input table event logs. Fortunately, creating input tables in existing workbooks continued to work as normal.

Incident start time: 17:20 UTC March 20, 2023
Incident end time: 22:52 UTC March 21, 2023

Root Cause

When a user clicks Save As on an exploration, a new workbook that is a copy of the exploration is created. As part of creating the copied workbook, new copies of input tables are also created, in which the Save As workflow triggers a request for each table element and its edits. A change to add more information about the edits to the metadata field did not end up sending the request to create the copied input tables to the backend service, causing no input tables to create and subsequently lead to the timeout failures of reading a nonexistent input table.

Mitigations and Fixes

Immediate mitigations:

Pushing a change to alleviate request timeouts.

Root cause fix:

The change to add more edits to the metadata field was fixed to send the request to the backend service appropriately. Logs were able to be fetched and input tables were successfully evaluated and created in copied workbooks.

1 Like