Speakers

Speaker "Alex Perrier" Details

Name :
alex perrier
Title :
Data Scientist
Topic :

Large data in Python with Scikit-learn and Dask

Abstract :

Although Scikit learn is optimized for small data, its out-of-core features enable the data scientist to work with Large data, i.e. Data that does not fit in the computer's memory. I'll present the scikit-learn algorithms compatible with this batch training approach and their respective performances on large datasets. However, data minging remains a time consuming problem when dealing with Large Data. This where, Dask a Python library comes in. By breaking operations into sequences that can be parallelized, Dask addresses the Large Data pre-processing part of the problem.

Profile :

Data Scientist at Berklee online, Contributor @ODSC, PhD signal processing,

x

Get latest updates of Big Data Bootcamp
sent to your inbox.

Weekly insight from industry insiders.
Plus exclusive content and offers.