Rightscripts executions and audit entries failing in shard 4
Incident Report for Flexera System Status Dashboard
Postmortem

Description: Customers experienced Rightscript executions and audit entries failing while using the Cloud Management Platform on Shard 4.

Timeframe: May 8th 01:54 am to May 8th 06:30 am PDT

Incident Summary

On Saturday May 8th at 01:54 am PDT, customers using the Cloud Management Platform reported experiencing Rightscript executions and audit entries failing while using the Cloud Management Platform. Technical teams were engaged and confirmed customer reports. While investigating the issue, services began operating normally and technical staff were unable to find the cause or what resolved the issues.

After monitoring services for a number of hours, the Incident was declared resolved at 06:30 am PDT on May 8th.

On Monday May 10th, additional investigations revealed that a number of Instances had not been “discovered” and made available to manage. Technical Staff ran a manual clean up script to update the database with the missing instances.

Root Cause

• Despite a thorough investigation by Engineering and SRE staff, the root cause of these issues is unable to be determined.

Corrective Action

• Additional logging has been enabled in case this event reoccurs.
• Auditing processes have been uplifted to alert if in the future new Instances are not updated by discovery services.
• Auditing processes will now automatically update any “undiscovered” instances on an hourly basis.
• RightNet Agent management services have been uplifted to include more comprehensive alerting and monitoring, including the ability to repair Agent disconnects automatically on an hourly basis.

Posted May 27, 2021 - 19:58 PDT

Resolved
Instance launches, Rightscript executions, and Self-Service cloudapp operations are completing successfully.
Posted May 08, 2021 - 06:30 PDT
Update
We are continuing to investigate this issue.
Posted May 08, 2021 - 02:54 PDT
Investigating
We are currently investigating failures in rightscript executions in CM shard 4. This can cause indirectly Self-Service operations failing in shard4.
Posted May 08, 2021 - 01:54 PDT
This incident affected: Legacy Cloud Management (Cloud Management Dashboard - Shard 4, Self-Service - Shard 4).