September 08 to 10 2014, Santa Clara, USA.


Speaker "Sameer Agarwal" Details

Name :
sameer agarwal
Company :
Title :
Software Engineer
Topic :

BlinkDB: Approximate Queries on Very Large Data

Abstract :

There is an exponential growth in data that is being collected and stored. This has created an unprecedented demand for processing and analyzing massive amounts of data. Furthermore, analysts and data scientists want results fast to enable explorative data analysis, while more and more applications require data processing to happen in near real time. In this talk, I'll present BlinkDB, which uses a radically different approach where queries are always processed in near real time, regardless of the size of the underlying dataset. This is enabled by not looking at all the data, but rather operating on statistical samples of the underlying datasets. More precisely, BlinkDB gives the user the ability to trade between the accuracy of the results and the time it takes to compute queries. The challenge is to ensure that query results are still meaningful, even though only a subset of the data has been processed. Here we leverage recent advances in statistical machine learning and query processing. Using statistical bootstrapping, we can resample the data in parallel to compute confidence intervals that tell the quality of the sampled results. To compute the sampled data in parallel, we build on Spark, which can compute tens of thousands of queries per second. BlinkDB is being integrated in SparkSQL, in Facebook Presto, and is also in the process of being deployed at a number of companies. This talk will feature an overview of the BlinkDB architecture and its design philosophy. I will also cover how the audience can leverage this new technology to gain insights in real-time using a variety of real-world use cases from our early adopters.

Profile :
Sameer Agarwal is a software engineer at Databricks working at the intersection of large scale distributed systems, databases and statistics. He received his PhD in Databases from UC Berkeley AMPLab where he led the research, design and development of BlinkDB (, an open-sourced, massively parallel approximate query processing framework. He received his B.Tech in Computer Science and Engineering from the Indian Institute of Technology where he was awarded the President of India Gold Medal in 2009. He was a Qualcomm Innovation Fellow in 2012-13 and a Facebook Graduate Fellow in 2013-14.

Get latest updates of 2nd Annual Global Big Data Conference
sent to your inbox.

Weekly insight from industry insiders.
Plus exclusive content and offers.