September 08 to 10 2014, Santa Clara, USA.


Speaker "Kishore Gopalakrishna" Details

Name :
kishore gopalakrishna
Company :
Title :
Staff Software Engineer
Topic :

One Grid to Rule them All: Building a Multi-tenant Data Cloud with YARN

Abstract :

Apache Hadoop YARN brings us a step closer to realizing the vision of Hadoop providing a single grid to run all data processing applications. The challenges posed by different applications such as batch computation, interactive queries, stream processing, iterative computation etc. vary widely. While simple stateless services require features like dynamic (re)configuration, service registry, and discovery, more complex stateful fault-tolerant systems like HOYA (HBase on YARN) typically require partition management, failure handling, scale up/scale down. Helix makes it easier to write distributed data applications on top of YARN by providing a generic application master. Helix is a cluster management framework that orchestrates the constraint-based assignment of distributed tasks in a cluster. While YARN allows a one to one mapping of container to task, Helix enables a many to one mapping of tasks to a container. Helix monitors the state of each task and allow one declare the behavior of each task using state machine and enforce constraints. A distributed system’s life cycle consists of building, provisioning, deploying, configuring, handling failures, and scaling with workload; each stage is affected by app constraints. We will explain how Helix with YARN can be leveraged to tackle challenges involved in each stage with minimal configuration.

Profile :

Kishore Gopalakrishna is software developer with great passion for using and building large scale distributed systems. As part of Data Infrastructure team at LinkedIn, Kishore has built Espresso, a distributed data store and Helix, a generic cluster management system. He is currently focused on building Pinot, an analytics platform. Prior to LinkedIn, Kishore spent large part of his time at Yahoo working on Ad systems that mostly involved data analysis using Hadoop and building systems like Apache S4 for near real time stream processing.


Get latest updates of 2nd Annual Global Big Data Conference
sent to your inbox.

Weekly insight from industry insiders.
Plus exclusive content and offers.