Po service health dashboard-u, problemi su se počeli javljati oko 1:41 AM PDT @ US-EAST-1 regiji, a manifestirali su se na EBS volumenima (trajni virtualni diskovi koji služe za bootanje virtualnih instanci) te EC2 instancama (EC2 je virtualna mašina koja je okosnica bilo koje cloud aplikacije na Amazonu). Čini se da je problem prouzrokovan od strane “mrežnog događaja” koji je pokrenuo masovni “re-mirroring” EBS Volumena. Da ne prevodimo…pročitajte samu obavijest…
A networking event early this morning triggered a large amount of re-mirroring of EBS volumes in US-EAST-1. This re-mirroring created a shortage of capacity in one of the US-EAST-1 Availability Zones, which impacted new EBS volume creation as well as the pace with which we could re-mirror and recover affected EBS volumes. Additionally, one of our internal control planes for EBS has become inundated such that it’s difficult to create new EBS volumes and EBS backed instances.
Quora se simpatično osvrnula na slučaj kazavši….
Do sada kvar nije otklonjen, a ekipa inženjera je objavila (11:09 AM PDT) da riješenje možemo očekivati “kroz par sati”umirujući nas da su sve ruke “upogonjene” te da ništa ne ostavljaju slučaju.
[UPDATE] 6:18 PM PDT – 22.4.2011 – 3:18 h ujutro @ Hrvatska
Earlier today we shared our high level ETA for a full recovery. At this point, all Availability Zones except one have been functioning normally for the past 5 hours. We have stabilized the remaining Availability Zone, but recovery is taking longer than we originally expected. We have been working hard to add the capacity that will enable us to safely re-mirror the stuck volumes. We expect to incrementally recover stuck volumes over the coming hours, but believe it will likely be several more hours until a significant number of volumes fully recover and customers are able to create new EBS-backed instances in the affected Availability Zone. We will be providing more information here as soon as we have it.
Here are a couple of things that customers can do in the short term to work around these problems. Customers having problems contacting EC2 instances or with instances stuck shutting down/stopping can launch a replacement instance without targeting a specific Availability Zone. If you have EBS volumes stuck detaching/attaching and have taken snapshots, you can create new volumes from snapshots in one of the other Availability Zones. Customers with instances and/or volumes that appear to be unavailable should not try to recover them by rebooting, stopping, or detaching, as these actions will not currently work on resources in the affected zone.
[UPDATE] 2:41 AM PDT – 22.4.2011 – 11:41 h ujutro @ Hrvatska
We continue to make progress in restoring volumes but don’t yet have an estimated time of recovery for the remainder of the affected volumes. We will continue to update this status and provide a time frame when available.
Napomena: Neuraplex klijenti koriste EC2 instance sa Europske zone stoga nisu zahvaćeni ovim kratkim ispadom resursa.