Back

Speaker "Brian Hess" Details Back

 

Topic

Big Data Analytics with Cassandra and Spark

Abstract

Apache Cassandra is the leading distributed database in use at thousands of sites with the world’s most demanding scalability and availability requirements. Cassandra's bread and butter is being able to serve up millions of concurrent transactions (reads, writes, updates) while providing zero downtime and linear scalability. The world is not just transactional data, however, and there is a need to analyze the transactional data captured and served in this online transactional system. Apache Spark is a distributed data analytics computing framework that has gained a lot of traction in processing large amounts of data in an efficient and user-friendly manner. It comes with a suite of tools from bulk analytics, SQL support, machine learning, graph analytics, and streaming capabilities. All it needs is data to process. The combination of Spark and Cassandra provides a powerful combination of real-time data collection with analytics of that data for deep insight. After a brief overview of Cassandra and Spark, this class will present an overview of various aspects of the integration of Cassandra and Spark.

Profile

Brian Hess has spent more than 15 years in the Big Data space, starting with over 10 years as a Cryptologic Mathematician in the US Department of Defense, where he worked on Data Science, Data Mining, and large scale data research. Brian then joined Netezza as Principal Mathematician and Director of Advanced Analytics, pushing Netezza to address new and advanced use cases. Brian joined DataStax over a year ago as an Analytic Architect, focusing on integrating DataStax and Cassandra with various tools and applications, especially analytics, BI, and ETL tools. He is currently the Senior Product Manager for Analytics at DataStax. Brian has Masters degrees in both Mathematics and Computer Science from the Johns Hopkins University.