/* ---- Google Analytics Code Below */

Monday, September 30, 2013

Big Data Myths and Misconceptions

I am in the midst of preparing an executive presentation about the use of analytics for companies in our region.    Our group consists of experienced practitioners from academia and industry.  As part of our preparation we have been gathering examples from dozens of years of applications.   I have noticed something interesting, all of the people involved are avoiding the term 'Big Data'.    We all preferred to talk about the improvement of business, then about the data, big or small,  that will make that possible.   Analytics.

But you can't avoid the term, its heavily hyped.    And it makes sense.  If you are gathering huge quantities of volatile data, it makes sense to mine it to find leverage points for your business.  But you have to understand your business process first, or else you are finding solutions for problems you do not have.

The small and medium sized business has a particular dilemma.  They don't have the funding to experiment with their data.   So they have to be focused before they start.   Expensive consultants to carve through the hype is not an option.

I recently heard a 16 minute podcast by Tom Deutsch ( @TomDeutsch) , program director of big data and analytics at IBM.   He also writes in  IBM's IBM Data Mag   In this podcast he addresses recently emerging myths.

I liked his presentation, well done and nontechnical.  I like the content of the talk, and like him I am also a skeptic on hype driven claims for Big Data.   Rather than state the myths, below are the contrary facts as I would state them.

- Although much Big Data work has been on unstructured data like text, video, etc.  It can certainly be used with structured data as well.  Consider it a test bed for trying analytics ideas on any data.

- Big Data does not inherently have quality problems.  Any data must be verified to determine if it is the required quality.  Garbage in, Garbage out.

- Machine learning, often a goal of Big Data efforts, does not eliminate human bias.  Humans still select data,  implement the results of the analytics, etc.  So human bias remains.

- Machine learning does not occur in real time.  These kinds of analytics need to be iterated on, adjusted and ultimately implemented.  Its possible that the results may be implemented in real time systems, when verified.

It seems all of these hyped claims from misunderstanding marketing types.  It also stems from the term 'Big Data' itself, which has magical capabilities attached to it.  That can sell consulting,  I will state it again, if we are completely honest, we should change the term used and call it Analytics.   Analytics that can now be done more efficiently for larger and more varied kinds of data.  Driven by improvements in hardware, software and connectivity.

I further like his statement, always good advice, even when talking to big consulting companies:
"Anyone who makes assertions and is unwilling to engage in a discussion or provide evidence for what they say, is probably someone who doesn't really know what they're talking about. Be very skeptical."

This post was written as part of the IBM for Midsize Business program, which provides midsize businesses with the tools, expertise and solutions they need to become engines of a smarter planet. I’ve been compensated to contribute to this program, but the opinions expressed in this post are my own and don't necessarily represent IBM's positions, strategies or opinions.

No comments: