Degraded
Ongoing Service Disruption in NYC/EWR Due to Storage Backend Issue
We are currently experiencing a disruption in our NYC/EWR cloud region due to a failure in the storage backend following our ongoing datacenter migration (NYC to EWR). A portion of our CEPH storage cluster was physically relocated ahead of schedule, while another portion remains at the old location. One critical node is also currently offline, resulting in a degraded cluster state.
As a result, the CephFS (used to access base images and volumes) is in a hung state and blocking all instance operations. This means that starting, stopping, or rebooting instances is currently not possible, although running instances remain online.
Additionally, instances with attached volumes are also unable to boot.
Our engineering team is actively working to recover the filesystem and restore the cluster’s health. Due to the complexity of this failure and the nature of Ceph’s consistency mechanisms, this process is delicate and time-consuming. At this time, we cannot provide a precise ETA for full restoration.
We will continue to post updates as we progress. We sincerely apologize for the inconvenience and understand how disruptive this is. Thank you for your continued patience.
Resolved
·
2 May at 07:37am EDT