Back

 Industry News Details

 
AWS big data storage tools power BI efforts Posted on : Aug 25 - 2016

Developers must choose among a wealth of big data storage and processing options. The effort is worth the time spent, as the tool makes all the difference in effective BI management.

Massive computing and analytical tools do much of the heavy lifting for big data projects, but big data also requires big storage. And engineers have multiple big data storage tools available to them.

Basic object store is akin to a Swiss army knife; it's suitable for nearly all types of data -- log files, test and research data, images and streaming media, to name a few. When selecting a storage option for a big data project, developers should consider big data storage tools that are inexpensive, highly available, offer high performance and integrate well with other cloud services. Amazon Simple Storage Service (S3) is often considered the de facto storage service for big data environments hosted in AWS.

But raw object storage is just one option. Big data can also use varied database services that organize data, speed analytics and support queries against large, often unrelated, data sets. If going this route, a developer will want big data storage tools that provide speed, easy management and administration, as well as scalability.

Amazon Relational Database Service (RDS) works with AWS query engines like Apache Presto to support several database engines, including SQL Server, PostgreSQL, MySQL, MariaDB, Oracle and Amazon Aurora. Alternately, enterprises that rely on NoSQL databases can use Amazon DynamoDB for extremely low latency and flexible data storage.

Big data projects are diverse, so no individual data store will fit all needs. For example, a developer dealing with large amounts of unstructured data -- such as log file analysis or massive machine learning text searches -- may require a distributed, nonrelational database service. That developer can use a tool such as the open source Apache HBase, which runs in concert with Hadoop on the Hadoop Distributed File System and integrates with the Apache Hive database. This makes HBase an ideal complement to Hadoop-based fault-tolerant distributed computing clusters. HBase and Hadoop are readily supported within Amazon Elastic MapReduce (EMR).

Using a data warehouse service, such as Amazon Redshift, provides another option among big data storage tools. These services act as central repositories for data collected and integrated from several sources. Data warehouse capacities reach into the petabytes; enterprises using business intelligence (BI) tools can frequently search and analyze data in these warehouses.

Data warehouses can retain data long term -- providing historical context for data analytics. Additionally, data warehouse options typically compress data to reduce storage costs and volumes.

Turning big data into intelligence and learning

Big data projects become even more valuable when enterprises turn that data into intelligence. Specialized BI management tools can help IT teams analyze, visualize and collaborate using big data.

When choosing a BI tool, look for those with fast responses, simple-to-use interfaces and interoperability with a variety of data sources, including internal data files and external third-party data sources like those from Salesforce. Amazon QuickSight is a BI management tool that can perform rapid calculations and visualizations for a wide range -- it's also intuitive enough for nontechnical employees to use. QuickSight integrates with other AWS utilities, such as EMR, S3 and RDS.

Amazon Machine Learning is an analytics tool that uses data to make predictions based on mathematical models. The idea is that the model can change dynamically -- learning from the behaviors and previous results obtained -- to constantly improve the value and accuracy of results. For example, machine learning can help identify attacks in network traffic patterns or personalize website content based on a user's activity.

Many machine learning tools rely on complex algorithms and manual tweaking, but technologies are evolving to help automate model creation and optimization. Amazon Machine Learning offers wizards and visualization capabilities that enable IT teams to construct and refine models to find patterns in data. Source