A lot of people that ask me what my plan is “when Amazon goes down”. It is hard to answer this question directly, since I think most users still see AWS as one cloud or global all-encompassing service. In reality it is:
- Eight regions (9 if you count gov cloud) in different parts of the world.
- Multiple availability zones in each region, providing physical isolation.
Amazon’s default advice is that it is your responsibility to make sure your application can survive an Availability Zone outage – and in my case I almost can: databases are Multi-AZ, webservers are Multi-AZ. The only piece of infrastructure that currently violates this is a search service that ties us to us-east-1a via an EBS volume.
However, Availability Zones won’t cover regionalized disasters, such as Hurricane Sandy, and it won’t cover all of Amazon’s oopses.
For the applications which we need higher availability than multi-AZ, I would much rather exhaust all of AWS’s seven other regions since I can guarantee 100% compatible APIs. When I’ve finished with this list, to me it’s time to start looking at third party providers. I think only a few edge cases fit in this category, such as NSD existing to increase the gene pool against software flaws/exploits.
It is also very easy to purchase a DNS service with latency-based routing and failover (via a probe URL you can specify) with providers like DynDNS and Neustar’s UltraDNS to implement an active/passive or an active/active (requires application support). AWS even announced DNS based failover this year, but at the moment it has a critical limitation that it can not health check its load balancers. Maybe in the future this will get even easier!