We are actively investigating gateway errors (502's) in PrinterCloud instances.

Incident Report for PrinterCloud

Postmortem

On March 16, 2023, a high internal load occurred within the database services in the APAC region, causing instability in the platform. The root cause was identified as requests being sent to the affected services before they were ready to receive traffic during a database system routine failover event. To address the issue, the operations team had to programmatically reduce running services across the region to isolate the load and allow the services to initialize stably. Health-check tuning and auto-scaling processes were also implemented to provide additional stability across the region. After hours of work, the team fully recovered the region with healthy services.

Vasion is now reviewing the load-balancing model used within the platform to identify areas where further tuning is required. The team is implementing a scheme of run levels to allow the environments to come online in stages instead of all at once, so services will be up, running, and stable before network traffic is sent to them. We are also optimizing the startup routines for speed and efficiency.

We have identified the primary factor for this incident was a bug in the Ubuntu operating system that was introduced into our environment on March 9, 2023, and has since been removed. For more information regarding this bug please refer to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2009325

Posted 2 years ago. Mar 16, 2023 - 10:29 MDT

Resolved

At 9:00 AM AEDT on March 16th, 2023 PrinterCloud servers started experiencing high-load in the APAC region. The web application was unavailable or intermittently loading with slow response times.

During our troubleshooting efforts we rolled back the latest system updates, and restarted our systems in effort to stabilize the environment.

Our current status is:
All systems are operational, we will continue to monitor and investigate the cause for the high-load on our servers.

We sincerely apologize for the inconvenience and thank you for your patience.
Posted 2 years ago. Mar 16, 2023 - 01:14 MDT

Monitoring

We are currently working on restoring services. Thank you for your patience and understanding as we continue monitor the situation.
Posted 2 years ago. Mar 16, 2023 - 00:33 MDT

Update

We are currently working on restoring services. Thank you for your patience and understanding as we continue monitor the situation.
Posted 2 years ago. Mar 16, 2023 - 00:17 MDT

Update

We continue to investigate this issue. We sincerely apologize for the inconvenience this may be causing and want to assure you that we are doing everything we can to reach a resolution.
Posted 2 years ago. Mar 15, 2023 - 23:30 MDT

Update

We sincerely apologize for the inconvenience this may be causing and want to assure you that we are doing everything we can to resolve the issue.
Posted 2 years ago. Mar 15, 2023 - 21:45 MDT

Investigating

Services were available briefly, but we are seeing 502 bad gateways again. We apologize and are continuing to investigate.
Posted 2 years ago. Mar 15, 2023 - 20:44 MDT

Monitoring

We continue to restore services and are monitoring the results.
Posted 2 years ago. Mar 15, 2023 - 20:26 MDT

Update

We are currently working on restoring services. Thank you for your patience and understanding as we continue to resolve the issue.
Posted 2 years ago. Mar 15, 2023 - 19:40 MDT

Update

We are continuing to investigate this issue.
Posted 2 years ago. Mar 15, 2023 - 18:52 MDT

Update

We are continuing to investigate this issue.
Posted 2 years ago. Mar 15, 2023 - 18:06 MDT

Update

We are continuing to investigate this issue.
Posted 2 years ago. Mar 15, 2023 - 17:21 MDT

Investigating

We are currently investigating this issue.
Posted 2 years ago. Mar 15, 2023 - 16:36 MDT
This incident affected: PrinterLogic | SaaS Sydney (PrinterLogic | SaaS Sydney).