Speaker "Adam Breindel" Details Back
-
Name
Adam Breindel
-
Company
Independent
-
Designation
Principal
Topic
Data Science with Spark: Beyond the Basics
Abstract
This class is aimed at practitioners who are already familiar with the basics of Apache Spark and are have tried the machine learning samples in the Spark docs or some of the ML tutorial examples online. We'll start from there and work to advance our knowledge of Spark ML. After briefly reviewing some fundamentals of Spark, DataFrames and Spark ML APIs, the class will then explore: - Performing feature preparation/transformation beyond the Spark built-in tools - "Borrowing" functionality from scikit-learn to help us pre-process features in Spark - Converting DataFrame data to access legacy (RDD) mllib features that are not yet exposed in the SparkML DataFrame API - Implementing data prep operations as reusable components by implementing new Transformers and Estimators - Adding a reusable parallel machine learning algorithm to Spark, by creating our own Estimator and Model classes - Sharing our reusable components with our Python data science colleagues by creating Python wrappers like those built into Spark