Global Big Data Conference

Industry News Details

Big data, big challenges: Hadoop in the enterprise Posted on : Jul 02 - 2015

As I work with larger enterprise clients, a few Hadoop themes have emerged. A common one is that most companies seem to be trying to avoid the pain they experienced in the heyday of JavaEE, SOA, and .Net -- as well as that terrible time when every department had to have its own portal.

To this end, they're trying to centralize Hadoop, in the way that many companies attempt to do with RDBMS or storage. Although you wouldn't use Hadoop for the same stuff you'd use an RDBMS for, Hadoop has many advantages over the RDBMS in terms of manageability. The row-store RDBMS paradigm (that is, Oracle) has inherent scalability limits, so when you attempt to create one big instance or RAC cluster to serve all, you end up serving none. With Hadoop, you have more ability to pool compute resources and dish them out.

Unfortunately, Hadoop management and deployment tools are still early stage at best. As awful as Oracle's reputation may be, I could install it by hand in minutes. Installing a Hadoop cluster that does more than "hello world" will take hours at least. Next, when you start handling hundreds or thousands of nodes, you'll find the tooling a bit lacking.

Companies are using devops tools like Chef, Puppet, and Salt to create manageable Hadoop solutions. They face many challenges on the way to centralizing Hadoop:

Hadoop isn't a thing: Hadoop is a word we use to mean "that big data stuff" like Spark, MapReduce, Hive, HBase, and so on. There are a lot of pieces.

Diverse workloads: Not only do you potentially need to balance a Hive:Tez workload against a Spark workload, but some workloads are more constant and sustained than others. View more

Get the