Postmortem for July 27, 2023 - Image (PNG) Exports Erroring on Sigma AWS, GCP, and Azure

Summary

On 27 July 2023, Sigma users who had scheduled and “send now” exports formatted as image (PNG) encountered “unknown error encountered” errors. A major upgrade to our service responsible for making PNG file types had been deployed overnight and this had introduced a breaking change. The upgrade was reverted as a mitigation and the errors stopped occurring. A permanent fix to the root cause of the errors was subsequently developed and the upgrade was safely redeployed.

Incident Start Time: 12:40 UTC July 27, 2023
Incident End Time: 13:30 UTC July 27, 2023

Root Cause

Overnight on 26 July 2023 we deployed a major upgrade to our service which is responsible for making image (PNG) files. This included upgrades to the service’s dependencies. One of the upgrades to a specific dependency introduced undetected problems with how that dependency interfaced with our existing code. This caused our service to malfunction when image (PNG) file types specifically were intended to be generated.

Mitigations & Fixes

  • Reverted the overnight upgrade as an immediate mitigation for the incident
  • Implemented a permanent fix for the root cause of the errors and safely redeployed the upgrade

Future Corrective Actions

Improve live test coverage specifically for PNG type exports

1 Like