Sunday, September 20, 2015

On the Mess of Unstructured Analytics

Dealing with the mess.  Agree that this should not be.  But we have more different kinds of unstructured data today than we ever had.  In face we dealt with it since the content analytics days.   Some very good points are made here.  But I warn that organizational changes are harder to make than statistical choices.

"   .... There’s a reason for this: Because the data is poorly managed to begin with. Businesses are treating analytics as a separate business function from data governance, when it’s actually fundamentally dependent on it. Analysis occurs downstream; so by neglecting initial information governance infrastructure and practices, the enterprise is essentially sampling tiny random buckets of data from a whitewater river of information.

Many firms struggle to manage or even understand what sort of unstructured content they even have, let alone begin to effectively manage it. History is partially to blame, to be sure; most attempts at managing unstructured content were hastily prompted by waves of regulatory and legal reform that demanded immediate action. A reactive response was triggered, and many of those initial “band-aid” information management fixes remain in place today. Simply scratching beneath the surface often reveals a tangled mess of siloed    .... " 

