
Speaker "Chunky Gupta" Details Back


-
Name
Chunky Gupta
-
Company
Mist Systems
-
Designation
Distributed Systems Engineer
Topic
Live Aggregators: A reliable, scalable and cost-effective way of aggregating billions of messages a day in real time
Abstract
We discuss Mist’s real-time data pipeline, focusing on Live Aggregators (LA)—a highly reliable, fault tolerant and scalable in-house real-time aggregation system that can autoscale for sudden changes in load. LA consumes billions of messages a day from Kafka with a memory footprint of over 4 TB and aggregates over 600 million time series. Since it runs entirely on top of AWS Spot Instances, it’s highly reliable. LA writes the aggregated data to the configured system (either be Cassandra, S3, SignalFx or Kafka). LA does over 9 billion writes to Cassandra per day and maintains over 600 million concurrent state machines. LA checkpoints the state to s3 to recover it from incase of failures, and restart from the kafka message where it left off. This empowers LA to recover from hours-long EC2 outage ensuring no data loss.
Who is this presentation for?
Data Scientists, Infrastructure Engineers, Distributed Systems Engineer, Site Reliability Engineers, and Directors of Engineering
Prerequisite knowledge:
What you'll learn?
-Understand considerations for designing real-time applications that can autoscale for seasonal changes in load and achieve service-wide CPU utilizations of over 75% -Learn how Mist reliably maintains over 4 TB application state amid high server faults by checkpointing in AWS S3 and uses multilevel aggregation to solve aggregation problem across sharded data -Discover how Mist identified key metrics that served as inputs for its autoscaling engine -Hear lessons learned from building a highly scalable and reliable real-time aggregation system