Speaker "Nirmal Sharma" Details Back
-
Name
Nirmal Sharma
-
Company
Walmartlabs
-
Designation
Big Data Architect
Topic
"How basic concepts of distributed engineering helps scale data science algorithms effectively using Spark and Dataframes"
Abstract
Data science algorithms are mostly computation intensive and the computation on small data set works all the time but the issue happens when algorithms need to do computation on huge amount of data. Now a days computation time plays very critical role in figuring out what algorithms should be pushed to production so all the algorithms has to be optimized for the computation time. Distributed engineering plays very important role in running these computation intensive algorithms faster and also speed up the whole life cycle of algorithms iteration. In this meet up, i will explain and give share my experiences on how we scaled up the algorithms to run fast on the massive amount of data by using basic concepts of distributed engineering ( I will share some use cases from my experiences in Netflix, Walmartlabs, Adchemy etc where i worked). Most of the time, people without understanding the existing technologies completely switch to other technology in anticipation of finding the solution to run their algorithms faster but i think thats not the correct way. I will explain how we run algorithms at scale using Spark and Mapreduce by applying distributed egg concepts which people often think does not work on algorithms at scale.