Back

Speaker "Chris Fregly" Details Back

 

Topic

Spark, Spark SQL, Batch Processing, Spark Streaming, Real-time Processing, Machine Learning, Textual Analysis, Graph Processing, Lambda Architecture, Sampling, Approximations, Big Data, ETL, Data Ingestion, Tuning, Monitoring, Scaling, Fault Tolerance, High Availability

Abstract

Spark After Dark is a mock dating site that uses the latest Spark libraries including Spark SQL, BlinkDB, Spark Streaming, MLlib, and GraphX to generate high-quality dating recommendations for its members and blazing fast analytics for its operators.  We begin with brief overview of Spark, Spark Libraries, and Spark Use Cases.  In addition, we'll discuss the modern day Lambda Architecture that combines real-time and batch processing into a single system.  Lastly, we present best practices for monitoring and tuning a highly-available Spark cluster. There will be many live demos covering everything from basic topics such as ETL and data ingestion to advanced topics such as streaming, sampling, approximations, machine learning, textual analysis, and graph processing.

Profile

Chris Fregly is a Developer Advocate for AI and Machine Learning at Amazon Web Services (AWS) based in San Francisco, California. He is co-author of the O'Reilly Book, "Data Science on AWS."
 
Chris is also the Founder of many global meetups focused on Apache Spark, TensorFlow, and KubeFlow. He regularly speaks at AI and Machine Learning conferences across the world including O’Reilly AI & Strata, Open Data Science Conference (ODSC), and GPU Technology Conference (GTC).
 
Previously, Chris was Founder at PipelineAI where he worked with many AI-first startups and enterprises to continuously deploy ML/AI Pipelines using Apache Spark ML, Kubernetes, TensorFlow, Kubeflow, Amazon EKS, and Amazon SageMaker.