/* ---- Google Analytics Code Below */

Monday, December 02, 2019

Dimensional Analysis of Failure and Fixing

Brought to my attention.  In Facebook Engineering Blog.   We also attempted to use neural methods to determine when agents, machine or otherwise, were likely to fail,  and afterwards what were the necessary means of repairing the problem.      Scale may be the biggest problem.  I realize the below is more about machine-agents, which we also saw, but could it be broadened to human agent behavior?

Fast dimensional analysis for root cause analysis at scale
Nikolay Pavlovich Laptev,  Fred Lin  Keyur Muzumdar,  Mihai-Valentin Curelea

What the research is: 
A fast dimensional analysis (FDA) framework that automates root cause analysis on structured logs with improved scalability. When a failure event happens in a large-scale distributed production environment, performing root cause analysis can be challenging. Various hardware, software, and tooling logs are often maintained separately, making it difficult to detect issues across multiple logs. Additionally, at scale, there could easily be millions of entities, each with hundreds of features, making it difficult to debug issues.

Our proposed FDA framework combines structured logs from a number of sources and provides a meaningful combination of features. That information arms engineers with actionable insights and helps them determine where to begin their investigation. And improved Apriori/FP-Growth algorithms sustain analysis at Facebook scale.

In the figure above, our system finds highly correlated features that explain the exception spike (red line).

How it works:
The FDA framework first fetches structured logs from various sources. Log data is deduplicated at query time (deduping significantly improves algorithm performance). Each duplicated row will have a samples column that counts original row frequency. The resulting data is then one-hot encoded (a Boolean 0/1 value, depending on whether a given feature is present in a given row) to transform it into a schema that fits the frequent pattern mining formulation. Frequent pattern mining is applied to identify frequent item-sets.   .... " 

No comments: