Global Big Data Conference

Industry News Details

How Apache Spark Is Transforming Big Data Processing, Development Posted on : Aug 31 - 2015

Apache Spark speeds up big data processing by a factor of 10 to 100 and simplifies app development to such a degree that developers call it a "game changer."

Apache Spark has been called a game changer and perhaps the most significant open source project of the next decade, and it's been taking the big data world by storm since it was open sourced in 2010.

Apache Spark is an open source data processing engine built for speed, ease of use and sophisticated analytics. Spark is designed to perform both batch processing and new workloads like streaming, interactive queries, and machine learning.

"Spark is undoubtedly a force to be reckoned with in the big data ecosystem,” said Beth Smith, general manager of the Analytics Platform for IBM Analytics. IBM has invested heavily in Spark.

Meanwhile, in a talk at the Spark Summit East 2015, Matthew Glickman, a managing director at Goldman Sachs, said he realized Spark was something special when he attended last year’s Strata + Hadoop World conference in New York.

He said he went back to Goldman and “posted on our social media that I’d seen the future and it was Apache Spark. What did I see that was so game-changing? It was sort of to the same extent when you first held an iPhone or when you first see a Tesla. It was completely game-changing.”

Matei Zaharia, co-founder and CTO of Databricks and the creator of Spark, told eWEEK Spark started out in 2009 as a research project at the University of California Berkeley, where he was working with early users of MapReduce and Hadoop, including Facebook and Yahoo.

He said he found some common problems among those users, chief among them being that they all wanted to run more complex algorithms that couldn’t be done with just one MapReduce step. View more

Get the