An intriguing point about the tool and its use, and general documentation of data, its origins and measurements (metadata) : " ... Why is Hadoop so popular? There are many reasons. First of all it is not so much a product as an ecosystem, with many components: MapReduce, HBase, HCatalog, Pig, Hive, Sqoop, Mahout and quite a few more. That makes it versatile, and all these components are open source, so most of them improve with each release cycle. ....
... So if someone in the company wants some external data or even internal data captured for later use, Hadoop can just sit there and drink it up. And that’s fine as long as you don’t lose track of what the data in the lake actually is. But this is where the devil crawls into the detail. You can scale Hadoop out so it becomes just one very large data lake and sits there gulping down all the data it can drink. You can also instantiate multiple instances of Hadoop, each devoted to a specific kind of usage, but we do not often hear about IT sites doing that – after all Hadoop scales out to the edge of the solar system, does it not? ... "
Wednesday, November 27, 2013
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment