/* ---- Google Analytics Code Below */

Friday, December 20, 2019

Breaking Things Productively

Have never used Chaos engineering, but the idea is interesting. Second read about the concept:

How to Use Chaos Engineering to Break Things Productively  by  Sam Bocetta in Infoq

More people connected to more servers, increased reliance on complex distributed networks, and a proliferation of apps in development mean more opportunities for data leaks and breaches.

Modern problems require modern solutions, as Amazon found out the hard way. Netflix escaped with minor inconvenience by being prepared.

What did they do differently?

Amazon Web Services (AWS), Amazon's cloud-based platform, experienced an outage on September 20, 2015, that crashed their servers for several hours and affected many vendors. Netflix experienced the issue as a blip because they've been there and done that when they changed their service delivery model. This led their engineering team to craft a unique solution for software production testing.

The solution? Chaos as a preventative for calamity. It's predicated on the idea of failure as the rule rather than the exception, and it led to the development of the first dedicated chaos engineering tools. Otherwise known as the Simian Army, they're called Chaos Monkey, Chaos Kong, and the newest member of the family, Chaos Automation Platform (ChAP).

What Are the Benefits of Chaos Engineering in DevOps?

Focusing only on a network environment and the associated security considerations (because the world of chaos engineering is quite large), we have already seen it as a positive force in an already strong cybersecurity market for improving business risk mitigation, fostering customer confidence, and reducing the workload for IT teams. If you're a business owner, you'll be blessed with happier engineers, reduced risk of revenue loss, and lower maintenance costs.

Customers, whether B2B or B2C, will enjoy greater service availability that's more reliable and less prone to disruptions. Tech teams will be able to reduce failure incidents and gain deeper insight into how their apps work. It will also lead to better design, faster mean time in response to SEVs, and fewer repeat incidences. ....  " 

No comments: