Speaker "Jon Gray" Details Back
-
Name
Jon Gray
-
Company
Cask
-
Designation
CEO
Topic
Hydrator: Open Source, Code-Free Data Pipelines
Abstract
To efficiently create and manage an enterprise Data Lake typically requires substantial effort to ingest, process, store, secure, and manage data from a variety of sources. Hydrator is an open source framework and self-service user interface for creating data lakes that simplifies the building and managing of production data pipelines on Spark, MapReduce, Spark Streaming and Tigon. The goal of this talk is to demonstrate broad, self-service access to Hadoop while maintaining the controls and monitors necessary within the enterprise. Hydrator provides these abilities to the enterprise and to all of the end-users the program, access, and manage enterprise data. Some of the features that will be demonstrated: Supports Ingestion, ETL, Aggregations and Machine Learning. Real-time and Batch. Supports majors distros and cloud providers. Built to allow enterprises to enable self-service while maintaining enterprise requirements for security and governance. The Hydrator open source ecosystem contains an extensive library of plugins to enable batch and real-time ingestion from traditional and modern databases, cloud services and other common data sources. There are dozens of community plugins for machine learning and analytics as well as pre-built pipelines for common end-to-end use cases. Drag-and-drop user interface where you build data ingestion and data processing pipelines from included, community and custom-built plugins as well as custom MapReduce and Spark jobs. Pipelines and plugins support versioning and are configured with JSON. Operate pipelines with management interface. Schedule and monitor pipelines through UI or REST APIs. Powerful metadata capabilities. Automatically captures complete audit and lineage information. Integrates with Security and MDM systems. Customize and limit access to data sources, sinks and any other plugins to provide simplified and controlled usage by non-technical users. Talk will include a live end-to-end demo of building and running ingestion and machine learning data pipelines.