March 27 to 29 2017, Santa Clara, USA.

Speakers

Speaker "Adam Breindel" Details

Name :
adam breindel
Company :
Title :
Principal
Topic :

Data Science with Spark: Beyond the Basics

Abstract :

This class is aimed at practitioners who are already familiar with the basics of Apache Spark and are have tried the machine learning samples in the Spark docs or some of the ML tutorial examples online. We'll start from there and work to advance our knowledge of Spark ML. After briefly reviewing some fundamentals of Spark, DataFrames and Spark ML APIs, the class will then explore: - Performing feature preparation/transformation beyond the Spark built-in tools - "Borrowing" functionality from scikit-learn to help us pre-process features in Spark - Converting DataFrame data to access legacy (RDD) mllib features that are not yet exposed in the SparkML DataFrame API - Implementing data prep operations as reusable components by implementing new Transformers and Estimators - Adding a reusable parallel machine learning algorithm to Spark, by creating our own Estimator and Model classes - Sharing our reusable components with our Python data science colleagues by creating Python wrappers like those built into Spark

Profile :
Adam Breindel consults and teaches widely on Apache Spark and other technologies. Adam's experience includes work with banks on neural-net fraud detection, streaming analytics, cluster management code, and web apps, as well as development at a variety of startup and established companies in the travel, productivity, and entertainment industries. He is excited by the way that Spark and other modern big-data tech remove so many old obstacles to system design and make it possible to explore new categories of interesting, fun, hard problems.
x

Get latest updates of Global Data Science Conference
sent to your inbox.

Weekly insight from industry insiders.
Plus exclusive content and offers.