Incident postmortem 2021-08-03
On 2021-08-03 2:35pm PST, one of our engineers made a bad database migration that caused discrepancies between the indexed data and the source of truth in the database. It impacted 39 users, and we backfilled the data and resolved the issue at 4:46pm PST.
Those impacted 39 users may lose data added between 2021-08-03 2:35pm PST and 4:46pm PST. We backfilled the data but cannot guarantee 100% recovery.
The root cause is our new database migration to reorganize the file structure and prepare for the dropbox integration. Unfortunately, we underestimated the number of users visiting this service during the deployment.
Next time in similar situations, we will
- Be more cautious about the database migration. Be aware that there are data insertions during the migration.
- Set the site to maintenance mode when we need to stop all the traffic and racing conditions.