Platform Outage
Incident Report for 7digital
Postmortem

The incident report for this outage is now available and can be found here.

Posted May 10, 2022 - 15:29 UTC

Resolved
Full service has been restored to all components of the platform and monitoring will continue. We're confident service has been restored and we will follow up with an incident report next week once we've been able to fully evaluate the issue and the actions taken to remedy it.
Posted Apr 28, 2022 - 12:55 UTC
Monitoring
As per the previous update a fix has been implemented and service has been restored to the platform, with the exception of ALC purchasing, locker and ALC/permanent download endpoints. We are continuing to monitor the fix and will also be working on restoring high-availability to the platform.
Posted Apr 28, 2022 - 12:29 UTC
Update
We are continuing to work to resolve the ongoing outage. The cause has been isolated to our SQL Server cluster, which is currently not able to keep certain databases online - choosing to take them down for as of yet, an unknown reason. Our efforts to force a single node to host the databases have so far not been effective at solving the current issue. We believe at this stage, that our cluster configuration has a non-trivial problem, and we are moving to bring up databases separately outside of the high-availability cluster to restore service as soon as we can. Following the return of stability, we will look to restore high-availability as soon as we can.
Posted Apr 28, 2022 - 11:52 UTC
Update
We are continuing to work on a fix for the issue and will provide an update again shortly. We can also confirm that cached streams continue to be served throughout the incident. We're reviewing activity to identify other areas of the platform which are only partially unavailable and will communicate updates with further information shortly.
Posted Apr 28, 2022 - 08:42 UTC
Update
We have been able to successfully bring back online the affected DB cluster, however we're still experiencing problems keeping the DB cluster online permanently, causing the platform to be unavailable. We are continuing to investigate and will provide further updates shortly.
Posted Apr 28, 2022 - 05:49 UTC
Identified
We've now isolated the cause of the outage to our DB cluster. We're observing an issue between the primary and replica node which is hindering our ability to bring the cluster back online. We're continuing to investigate the issue and will provide further updates as they're available.
Posted Apr 28, 2022 - 04:34 UTC
Investigating
We are currently investigating a suspected platform outage. Currently all endpoints on 7digital's API are unavailable, 7digital engineers are investigating the cause and we'll provide further information as soon as it's available.
Posted Apr 28, 2022 - 03:39 UTC
This incident affected: Media Delivery (Downloading, Fingerprinting, Streaming), API (Catalogue API, Catalogue Feeds, Catalogue Search, Playlist API, Stream Logging, Subscriptions, User API), and Playlist Tool.