Back

Speaker "Anirudh Todi" Details Back

 

Topic

TSAR (the TimeSeries AggregatoR) - How to Count Tens of Billions of Daily Events in Real Time Using Open Source Technologies

Abstract

Twitter’s 250 million users generate tens of billions of tweet views per day. Aggregating these events in real time – in a robust enough way to incorporate into our products – presents a massive scaling challenge. In this talk I’ll introduce TSAR (the TimeSeries AggregatoR), a robust, flexible, and scalable service for real-time event aggregation designed to solve this problem and a range of similar ones. I’ll discuss how we built TSAR using Python and Scala from the ground up, almost entirely on open-source technologies (Storm, Summingbird, Kafka, Aurora, and others), and describe some of the challenges we faced in scaling it to process tens of billions of events per day.

Profile

At Twitter, Anirudh works on the Data Platform team. Anirudh and his team are chartered with processing and understanding the vast body of data that is generated by the operation of the Twitter platform. Their technologies are used to provide insights both to platform engineering and to other teams throughout the company. They build a range of cutting-edge services that can process petabytes of data per month in real time for insights into the usage patterns of the Twitter platform, and to build high-performance infrastructure to deliver those insights to Twitter and to Twitter’s partners. Anirudh has previously worked at Facebook helping scale their HBase cluster to process the billions of messages that are sent every day. At college, he built “Politify”, a startup that used financial data to model the US economy and simulate the impacts of political policies on American households. Additionally, he has experience working at a genomics startup and has taught a class on Open Source software while at UC Berkeley.