Back

Speaker "Jon Gray" Details Back

 

Topic

Hydrator: Open Source, Code-Free Data Pipelines

Abstract

To efficiently create and manage an enterprise Data Lake typically requires substantial effort to ingest, process, store, secure, and manage data from a variety of sources. Hydrator is an open source framework and self-service user interface for creating data lakes that simplifies the building and managing of production data pipelines on Spark, MapReduce, Spark Streaming and Tigon. The goal of this talk is to demonstrate broad, self-service access to Hadoop while maintaining the controls and monitors necessary within the enterprise. Hydrator provides these abilities to the enterprise and to all of the end-users the program, access, and manage enterprise data. Some of the features that will be demonstrated: Supports Ingestion, ETL, Aggregations and Machine Learning. Real-time and Batch. Supports majors distros and cloud providers. Built to allow enterprises to enable self-service while maintaining enterprise requirements for security and governance. The Hydrator open source ecosystem contains an extensive library of plugins to enable batch and real-time ingestion from traditional and modern databases, cloud services and other common data sources. There are dozens of community plugins for machine learning and analytics as well as pre-built pipelines for common end-to-end use cases. Drag-and-drop user interface where you build data ingestion and data processing pipelines from included, community and custom-built plugins as well as custom MapReduce and Spark jobs. Pipelines and plugins support versioning and are configured with JSON. Operate pipelines with management interface. Schedule and monitor pipelines through UI or REST APIs. Powerful metadata capabilities. Automatically captures complete audit and lineage information. Integrates with Security and MDM systems. Customize and limit access to data sources, sinks and any other plugins to provide simplified and controlled usage by non-technical users. Talk will include a live end-to-end demo of building and running ingestion and machine learning data pipelines.

Profile

Jonathan Gray, Founder & CEO of Cask, is an entrepreneur and software engineer with a background in startups, open source and all things data. Prior to founding Cask, Jonathan was a software engineer at Facebook where he drove HBase engineering efforts, including Facebook Messages and several other large-scale projects from inception to production. An open source evangelist, Jonathan was responsible for helping build the Facebook engineering brand through developer outreach and refocusing the open source strategy of the company. Prior to Facebook, Jonathan founded Streamy.com, where he became an early adopter of Hadoop and HBase and is now a core contributor and active committer in the community. Jonathan holds a bachelor’s degree in Electrical and Computer Engineering and Business Administration from Carnegie Mellon University.