Back

Speaker "Sheetal Dolas" Details Back

 

Topic

Design patterns for real time streaming data analytics

Abstract

As businesses are realizing the power Hadoop and large data analytics, many businesses are demanding large scale real time streaming data analytics. Apache Storm and Apache Spark are platforms that can process large amount of data in real time. However building applications on these platforms that can scale, reliably process data without any loss, satisfy functional needs and at the same time meet the strict latency requirements, takes lot of work to get it right.
After implementing multiple large real time data processing applications using these technologies in various business domains, we distilled commonly required solutions into generalized design patterns. These patterns are proven in the very large production deployments where they process millions of events per second, tens of billions of events per day and tens of terabytes of data per day.

Latency sensitive lossless micro batching, high scale data enrichment through external systems lookup, dynamic rules and alerts, adaptive self tuning, real time stream joins are some of patterns to name.

This talk covers these proven design patterns and for every design pattern it covers – problem statement, applicability of design pattern, the pattern design and sample code demonstrating the implementation.

Attendees can take advantages of these patterns in building their applications and improve their productivity, quality of solution as well as success factor of their applications.

Profile

Sheetal is a Principal Architect working with Hortonworks. He has strong expertise in Hadoop ecosystem with very rich & diverse field experience across various verticals including Telco, Hi Tech, Retail, Internet Companies etc. He has served in key positions as Lead Big Data Architect, SOA Architect in variety of extremely large & complex enterprise programs. Has extensive knowledge of BigData/NoSql technologies including Hadoop/Yarn/Hive/Pig/HBase/Storm/Kafka/ElasticSearch etc. He has defined & established data architectures for multi-petabyte warehouses on Hadoop, has extensive hands on experience in deploying, tuning very large Hadoop clusters & building scalable applications on them.