Maybe, but that won't necessarily protect you from a fat fingered typo. https://aws.amazon.com/message/41926/
Ah, yes, that outage was quite notorious, but it was contained to S3 in us-east-1 (N. Virginia). I've never seen a systemic failure in AWS that crossed a region boundary. Having discussed it with their engineers previously, the "control plane" (software, configuration, management) is segmented by region (a region being a geographical center with a set of one or more "availability zone" datacenters), with few if any dependencies between regions, for exactly that purpose -- to avoid systemic failures across the whole platform. So anyone who had a proper DR strategy in place with replication of S3 objects between regions and a solid (DNS, CDN, etc.) failover method in place was not affected.
This includes the rather large set of AWS infrastructure I am responsible for, so the "AWS outage" was a complete non-event for me. That was also good because I was on vacation!
Not sure about Google, but Azure have had far more than their fair share of outages, ISTR DNS config changes being the route cause of one or two of them. DNS is itself is a nightmarish risk when combined with fat fingers. Edit: since I started writing this post yesterday, looks like Office 365 has been out again although it's not clear if this is just retail or enterprise too.
I agree, Azure is not quite as mature as AWS from an availability or a services standpoint. There has been a lot of churn in their platform implementation in the last few years. Of course trying to make the traditional Microsoft services (AD, SQL server, etc.) both elastically scalable and highly available is also very challenging in ways that AWS doesn't have to deal with. Microsoft has a lot of baggage there.
O365 especially is notoriously unreliable, and unless you are lucky enough to have a direct line into Microsoft, support is horrible.
I agree, but using the same provider for your production and DR does not remove common mode failure when they are using distributed configs, and as a punter you won't have any visibility of those changes in the cloud anyway until it breaks. To do DR "properly" in the cloud necessarily makes it expensive if you're to avoid such failures, and in many cases it won't save you a dime, and can be more expensive if you end up using multiple cloud vendors to spread and reduce risk.
Right, I mean, if you want to be ideally protected you have vendor diversity, control plane diversity, geographical diversity of your administrative team, etc. It can get impractical. But even solely within AWS, taking advantage of the region partitioning (above) and carefully considering your other points of failure (like DNS) gets you most of the way there in terms of practical uptime; meaning five-nines (99.999%) availability of the infrastructure in aggregate is quite achievable. Availability at that point ceases to become an infrastructure issue and tends to become more of an application reliability issue.
But you're still worlds apart from a traditional enterprise datacenter solution. As expensive as DR is in AWS, traditional DR is even moreso, since you're on the hook for the costs of the datacenter facilities and hardware up front, whether you are using it or not.
Never forget that the internet does not have an SLA.
Some workloads are definitely better left local. But I'd still say that the expertise necessary to competently run a physical datacenter, with all the facilities maintenance, networking, and systems design concerns still presents a large and tangible risk as well. Entire datacenters become disabled all the time due to generator failures, bad UPS maintenance, cooling issues, cheap and poor networking design, limited upstream capacity and DDoSes, etc. So many people and so much expertise is required just to keep the lights on, and most companies aren't willing to do it properly.
Five nines type availability is a difficult engineering exercise no matter which way you do it, but for the competent and informed I still think services like AWS make it more accessible.