Speaker "Flip Kromer" Details
Deathmarch through the internals of Storm + Trident
Storm+Trident is an incredibly exciting open-source framework for Streaming Analytics. Storm enables scalable high-throughput processing by programs in your language of choice, on distributed networks of commodity servers. Trident extends Storm to allow exactly-once processing, enabling large-scale aggregations and advanced analytics in real-time. It's easy to get excited about Storm+Trident's performance and its scalability in terms of compute -- we've benchmarked it at half-a-million events per second. But the real revolution comes from its tolerance of slowness and its scalability in terms of developers. Storm+Trident is equally happy hosting highly concurrent tasks that make hundreds of fallible 500+ millisecond requests against an external API or legacy datastore, cpu-heavy tasks that takes a handful of seconds to process each batch, or any other tempo your application requires. When we got serious with Trident and Storm at Infochimps, we found virtually no resources to clarify things like the lifecycle of a record, the relationship between Storm and Trident, production tuning, and so forth. This talk intends to pick up where the internet leaves off and fill that gap. You'll leave knowing enough about the Storm+Trident internals to be able to reason about your programs' performance and confidently bring it to production The talk will be highly technical. Mastering the 7th Dan of Dragon-Lightning Form isn't required before attendance, but if you're not yet familiar with Storm you may want to browse one of the following before the talk: * The official Trident Tutorial, which shows what you can do with Trident. (Trident is an alternative interface to Storm, and I recommend adopting it from the start) (https://github.com/nathanmarz/storm/wiki/Trident-tutorial) * "Storm: Realtime Processing" by Michael Vogiatzis is a great introduction to Trident, using a different example than the hoary "distributed word count" that everyone does. (http://www.slideshare.net/MichaelVogiatzis/storm-realtime-processing)
Philip (Flip) Kromer is co-founder of Infochimps where he built scalable architecture that allows app programmers and statisticians to quickly and confidently manipulate data streams at arbitrary scale. He holds a B.S. in Physics and Computer Science from Cornell University and attended graduate school in Physics at The University of Texas at Austin. He authored the O’Reilly book on data science in practice and has spoken at South by Southwest, Hadoop World, Strata, and CloudCon. Email Flip at firstname.lastname@example.org or follow him on Twitter at @mrflip.