September 08 to 10 2014, Santa Clara, USA.


Speaker "Haoyuan Li" Details

Name :
haoyuan li
Company :
Title :
Topic :

Tachyon: A Reliable Memory Centric Storage for Big Data Analytics

Abstract :

Memory is the key to fast big data processing. This has been realized by many, and frameworks, such as Spark and Shark, already leverage memory performance. With these advancement, big data storage is becoming a critical bottleneck in many workloads. In this talk, we introduce Tachyon, a memory centric fault-tolerant distributed file system, which enables reliable file sharing at memory-speed across cluster frameworks, such as Spark and MapReduce. Tachyon achieves memory-speed and fault-tolerance by using memory aggressively and leveraging lineage information. Tachyon caches working set files in memory, and enables different jobs/queries and frameworks to access cached files at memory speed. Thus, Tachyon avoids going to disk to load datasets that are frequently read. Tachyon is Hadoop compatible. Existing Spark and MapReduce programs can run on top of it without any code change. The project is open source and is deployed at multiple companies. It has more than 40 contributors from over 15 institutions, including Yahoo, Intel, Redhat etc. The project is also part of Fedora distribution.

Profile :

Haoyuan Li is a Computer Science Ph.D. candidate in AMPLab at UC Berkeley, and he works with Prof. Scott Shenker and Prof. Ion Stoica on big data and cloud computing. He leads Tachyon, an open source memory-centric distributed file system enabling reliable file sharing at memory-speed across cluster frameworks. He is a founding committer of Apache Spark and a co-creator of Spark Streaming. Before Berkeley, he worked at Conviva and Google, where he co-created PFPGrowth algorithm, which is included in Apache Mahout. Haoyuan has a M.S. from Cornell University and a B.S. from Peking University, both in Computer Science.


