August 30 to 01 2016, Santa Clara, USA.
Speaker "Debraj Guhathakurta" Details
Senior Data Scientist
Scalable data science in Spark: Machine learning and end-to-end data analysis processes in Spark using Python and R
When it comes to scalable data analysis and ML, many data scientists are frequently hindered by limited availability of functions to handle large data-sets, appropriate infrastructure, and ability to produce models that can be consumed easily in production. This talk will address such issues using the Spark framework and languages that are popular amongst data scientists. We will present examples of end-to-end data analysis processes in Python and R, with an emphasis on distributed machine learning, that attendees to adopt in their own data science practice.
Debraj GuhaThakurta is a Senior Data Scientist in Microsoft’s Azure Machine Learning group. His effort focusses on the use of different platforms and toolkits, such as Microsoft’s Cortana Analytics suite, R Server, SQL Server, Hadoop and Spark clusters, for creating scalable and operationalized analytical processes for various business problems. Debraj has extensive industry experience in biopharma and financial forecasting domains. He has a Ph.D. in chemistry & biophysics, and post-doctoral research experience in machine learning applications in genomics. He has published more than 25 peer-reviewed papers, book-chapters and patents.
Get latest updates of 4th Annual Global Big Data Conference
sent to your inbox.
Weekly insight from industry insiders.
Plus exclusive content and offers.