The Eponymous Pickle: Critical Role of Human Performance in Software

Monday, April 27, 2020

Critical Role of Human Performance in Software

Software is ultimately a collaboration between people and machines. People still need to be adaptive in critical systems. Which needs to lead to better designing of such systems. Here quite a considerable look at the problem.

Revealing the Critical Role of Human Performance in Software
By David D. Woods, John Allspaw
Communications of the ACM, May 2020, Vol. 63 No. 5, Pages 64-67 10.1145/3380468

Four articles, published across the March through May issues of Communications, highlight how people are the unique source of the adaptive capacity essential to incident response in modern Internet-facing software systems. While it's reasonable for software engineering and operations communities to focus on the intricacies of technology, there is not much attention given to the intricacies of how people do their work. Ultimately, it is human performance that makes modern business-critical systems robust and resilient.

As business-critical software systems become more successful, they necessarily increase in complexity. Ironically, this complexity makes these systems inherently messy so that surprising incidents are part and parcel of the capability to provide services at larger scales and speeds.13 Studies in resilience engineering 2,12 reveal that people produce resilient performance in messy systems by doing the cognitive work of anomaly response; coordinating joint activity during events that threaten service outages; and revising their models of how the system actually works and malfunctions using lessons learned from incidents. People's resilient performance compensates for the messiness of systems, despite constant change.

Thus, incidents that threaten service outages are endemic as an emergent side effect of the increasing complexity of the interdependencies required to provide valuable services at scale. Incidents will continue to present challenges that require resilient performance, regardless of past reliability statistics. It is the cognitive work, coordination across roles, and adaptive capacity of people that resolve anomalies as they threaten to grow into service outages.4 To be more specific: modern business-critical systems work as well as they do because of the adaptive capabilities of people; and without the cognitive work that people engage in with each other, all software systems eventually fail (some with increasingly catastrophic impact, given the criticality of the services they provide). ... "

Human Performance and Software Engineering

Richard Cook connects human performance to software tooling through his insightful "Above the Line/Below the Line" diagram.5 Cook points out that discussions focused solely on the technology miss what is actually going on in the operations of Internet-facing applications. Figure 1 in Cook's article reveals the cognitive work and joint activity that go on above the line and places the technology and tooling for development and operations below the line. The "line" here is the line of representation. No one can directly inspect or influence the processes running below the line; all understanding and action are mediated through representations.

Below the line are the facilities engineers use to develop, change, update, and operate software that enables valuable services. This includes all the components needed to create the value that businesses provide to customers: the technology stack, code repositories, data sources, and a host of tools for testing, monitoring, deployment, and performance measurement, as well as the various ways of delivering these services.

The above-the-line area in the diagram includes the people who are engaged in keeping the system running and extending its functionality. They are the ones preparing to deploy new code, monitoring system activities, and re-architecting the system. These people ask questions such as: What's it doing now? Why is it doing this? What's it going to do next? This cognitive work—observing, inferring, anticipating, planning, and intervening, among others—is done by interacting, not with the things themselves, but with representations of them. Interestingly, some representations (for example, dashboards) are designed by (and for) software engineers and other stakeholders.

Notice all the above-the-line actors have mental models of what is below the line. These models vary depending on people's roles and experience, as well as on their individual perspectives and knowledge. Notice that the actors' mental models are different. This is because there are general limits on the fidelity of models of complex, highly interconnected systems.11 This is true of modern software systems and is demonstrated by studies of incident response; a common statement heard during incidents or in the postmortem meetings afterward is, "I didn't know it worked that way."12 Cook's concept and diagram reframes how Internet-facing systems function and is utilized by the other articles in the set. .... " (More at the link)

The Eponymous Pickle

About Me

RSS

Blog Archive

Monday, April 27, 2020

Critical Role of Human Performance in Software

No comments: