Postmortem for June 16, 2023 Workbooks with Input Tables failing to load Incident - Sigma AWS & Sigma GCP

Summary

On June 16, 2023 users in Sigma organizations hosted on AWS US and GCP cloud regions were unable to load existing workbooks if an Input Table was present in them. They encountered an error page stating: “cannot return null on non-nullable field ApiError.code”. The incident occurred due to a misconfigured change causing our Input Tables service to fail. This failure cascaded via failing critical API requests that resulted in workbooks being unable to load.

Incident Start Time: 20:34 UTC June 16, 2023
Incident End Time: 21:05 UTC June 16, 2023

Root Cause

Invalid json with our runtime trace label (trailing comma) was applied to our clusters which caused Input Table service to fall into crashloopbackoff. All Input Table services went down when requests came in, causing critical API requests to fail, breaking workbook loading.

Mitigations & Fixes

The issue was mitigated rapidly by reverting the misconfigured change.

Future Corrective Actions

  • Improve our readiness checks related to Input Tables for critical configurations
  • Prevent workbook loading hard failing due to the particular failing API request