Back

Speaker "Kapil Surlaker" Details Back

 

Topic

Building a real-time, self-service data analytics ecosystem at LinkedIn.

Abstract

LinkedIn has a rich ecosystem of data-driven products like People you may know, Who viewed my Profile, recommendation products as well as business facing insights. Building a data product end-to-end requires a lot of technologies to come together and work seamlessly and requires innovations far beyond traditional data warehousing. A major focus at LinkedIn has been to improve the agility of the engineers and data scientists in creating these data products end to end. To that end, we have developed a number of systems in the analytics data ecosystem. These include a platform to manage ingestion of variety of data sources at scale, a platform to do joins and complex calculations at extremely large scale, a platform for extremely fast OLAP serving including real-time drilldowns and a platform to enable data lineage analysis and data discovery. These are all the pieces required to have an effective self-service offline data ecosystem. In this talk, we will go into the details of some of these systems and show how they provide a self-service real-time analytics ecosystem.

Profile

Kapil Surlaker leads the Data Analytics Infrastructure team at LinkedIn, as a director of engineering. The team builds and enhances core infrastructure platforms such as Hadoop, Spark, other computation frameworks such as Rubix and Pinot, an OLAP serving store. Previously, Kapil led the development of Databus - a database change capture platform that forms the backbone if LinkedIn's online data ecosystem, Espresso - a distributed document store that powers many applications on the site and Helix - a generic cluster management framework that manages multiple infrastructure deployments at LinkedIn. Prior to LinkedIn, Kapil held senior technical leadership positions in Kickfire (acquired by Teradata) and Oracle. Kapil holds a B.Tech. (CS) from IIT, Bombay and M.S. from Univ of Minnesota.