Possible delays in ascynchronous scoring queue in US EAST 1
Incident Report for Learnosity
Postmortem

On November 9, 2021, asynchronous scoring delays affected a small subset of customers for 3.25 hours. (94.24%, or 49.16 million, of scoring messages were processed in less than a second on November 9.) This was caused by temporary database locks due to inefficient queries that processed unexpectedly large database operations (joins and index creations).

Immediate resolution was achieved by increasing the number of database sync workers and AWS Relational Database Service instances to work through the scoring backlog. Ongoing work is continuing, as we analyze and improve database queries in the affected areas.

Posted Dec 03, 2021 - 14:36 EST

Resolved
As of 23:30 UTC, we are have resolved the delays in availability of scored sessions in US EAST 1.

Learnosity Support and Systems Engineering teams will follow up with a post mortem once we have completed root cause analysis and finalized any next steps or preventative measures required.

Please reach out if you have any questions or concerns.
Posted Nov 09, 2021 - 18:41 EST
Monitoring
As of 22:45 UTC, we've seen no delays for more than 30 minutes. We will continue to monitor for a further 30 minutes before calling this resolved. Thanks to AWS Support for additional help.

Learnosity Support and Systems Engineering teams are continuing to actively investigate the issue, and will follow on with an update and resolution as soon as possible.
Posted Nov 09, 2021 - 17:49 EST
Update
As of 22:00 UTC, we've made significant improvements in our async scoring queue, reducing delays by over 90% for the small subset of affected customers, and are actively working with AWS to resolve the issue. We will continue with our degraded performance status until the queue is clear but, as previously stated, if your team is unaffected, please ignore this update.

Learnosity Support and Systems Engineering teams are continuing to actively investigate the issue, and will follow on with an update and resolution as soon as possible.
Posted Nov 09, 2021 - 17:14 EST
Update
As of 21:00 UTC, we are actively investigating delays affecting asynchronous scoring in the US East 1 region. This issue is affecting a very small number of customers, so if you are not seeing delays you can disregard this notice.

Learnosity Support and Systems Engineering teams are continuing to actively investigate the issue, and will follow on with an update and resolution as soon as possible.
Posted Nov 09, 2021 - 16:07 EST
Update
As of 19:15 UTC, we are continuing to investigate scoring delays in our US East 1 region. Our investigation to date seems to indicate that this is affecting a very small number of customers, so it's expected that most are not seeing delays and can disregard this notice.

Learnosity Support and Systems Engineering teams are continuing to actively investigate the issue, and will follow on with an update and resolution as soon as possible.
Posted Nov 09, 2021 - 14:24 EST
Update
As of 17:45 UTC, we are now seeing delays affecting scoring of some submitted sessions in the US East 1 region, which remains in degraded performance status.

Learnosity Support and Systems Engineering teams are continuing to actively investigate the issue, and will follow on with an update and resolution as soon as possible.
Posted Nov 09, 2021 - 13:20 EST
Update
As of 17:45 UTC, we are continuing to investigate delays affecting scoring of some submitted sessions in the US East 1 region and are reclassifying as degraded performance

Learnosity Support and Systems Engineering teams are continuing to actively investigate the issue, and will follow on with an update and resolution as soon as possible.
Posted Nov 09, 2021 - 12:52 EST
Investigating
As of 17:15 UTC, we are seeing a slight build-up in asynchronous scoring queue messages in US East 1.

No known ill-effects have been detected so far, but we will keep this incident active until our investigation is complete.

Learnosity Support and Systems Engineering teams are actively investigating the issue, and will follow on with an update and resolution as soon as possible.
Posted Nov 09, 2021 - 12:31 EST
This incident affected: AMER || Data Centric (Updating session response scores).