A place where we often need assistance, Often an important aspect of process design and optimization. Technical.
Cognitive Work of Hypothesis Exploration During Anomaly Response
A look at how we respond to the unexpected
Marisa R. Grayson
Web-production software systems operate at an unprecedented scale today, requiring extensive automation to develop and maintain services. The systems are designed to adapt regularly to dynamic load to avoid the consequences of overloading portions of the network. As the software systems scale and complexity grows, it becomes more difficult to observe, model, and track how the systems function and malfunction. Anomalies inevitably arise, challenging incident responders to recognize and understand unusual behaviors as they plan and execute interventions to mitigate or resolve the threat of service outage. This is anomaly response.1
The cognitive work of anomaly response has been studied in energy systems, space systems, and anesthetic management during surgery.9,10 Recently, it has been recognized as an essential part of managing web-production software systems. Web operations also provide the potential for new insights because all data about an incident response in a purely digital system is available, in principle, to support detailed analysis. More importantly, the scale, autonomous capabilities, and complexity of web operations go well beyond the settings previously studied.7,8
Four incidents from web-based software companies reveal important aspects of anomaly response processes when incidents arise in web operations, two of which are discussed in this article. One particular cognitive function examined in detail is hypothesis generation and exploration, given the impact of obscure automation on engineers' development of coherent models of the systems they manage. Each case was analyzed using the techniques and concepts of cognitive systems engineering.9,10 The set of cases provides a window into the cognitive work "above the line" (see "Above the Line, Below the Line" by Richard Cook in this issue) in incident management of complex web-operation systems (cf. Grayson, 2018). .... "
Thursday, March 05, 2020
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment