Back

Speaker "Debraj Guhathakurta" Details Back

 

Topic

Scalable data science in Spark: Machine learning and end-to-end data analysis processes in Spark using Python and R

Abstract

When it comes to scalable data analysis and ML, many data scientists are frequently hindered by limited availability of functions to handle large data-sets, appropriate infrastructure, and ability to produce models that can be consumed easily in production. This talk will address such issues using the Spark framework and languages that are popular amongst data scientists. We will present examples of end-to-end data analysis processes in Python and R, with an emphasis on distributed machine learning, that attendees to adopt in their own data science practice.

Profile

Debraj GuhaThakurta is a Senior Data Scientist Lead in Microsoft’s AI & Research. His effort focusses on the use of different platforms and processes (such as Microsoft’s Azure ML, Cortana Suite, Cognitive Services, ML Server, SQL Server, Spark, Team Data Science Process), for creating scalable and operationalized AI solutions. Debraj has extensive industry experience in machine learning applications in biopharma and forecasting domains. He has a Ph.D. in chemistry & biophysics, and post-doctoral research experience in machine learning applications in genomics. He has published more than 25 peer-reviewed papers, book-chapters and patents.