Back

 Industry News Details

 
Five Ways to Get More out of Hadoop Posted on : Sep 23 - 2016

Time has sped up for most organisations. Innovation, responding to customers, bringing products to market are all requirements that must be achieved in half the time. Decisions can no longer be based on information that’s a week, a day or even hours old. Response times are measured in milliseconds with real-time as the ideal.

Hadoop, the big data processing tool, is now being used extensively to help businesses achieve this ultra-fast insight. So developers are looking at ways to optimise its use to further gain the edge over competitors.

If you’re a developer, here are 5 ways you can sharpen your use of the framework:

Go Faster

Just by moving from data integration jobs built with MapReduce to Apache Spark you will be able to complete them around 2.5 times faster.

Then, once you have converted these jobs, if you add in Spark-specific components for caching and positioning you can increase performance a further five times.

From there, if you increase the amount of RAM on your hardware you can do more things in memory and actually experience a 10 times improvement in productivity.

So with all of this and with your traditional bulk and batch data integration jobs you can improve performance dramatically.

Go Real Time

It’s one thing to be able to do things in bulk and batch; it’s completely another to be able to do them in real time.  This not about understanding what your customer did on your website yesterday, it’s about what they are doing right now – and be able to influence customers’ interactions.

The great thing about Spark and Spark streaming is that you now have one toolset that allows you to operate in bulk and batch and in real-time.

Using Talend you can design integration flows with one toolset across all this so you can be pulling in from historical data sources, from Oracle and Salesforce, and then coming in with real-time streaming data from websites, mobiles and sensors.

The bulk and batch information may be stored in Hadoop; the real-time information could be stored in NoSQL databases and you can use a single query interface using Spark SQL form mobile, analytic and web apps.

Get smart

So you can now do it real-time – but how about in intelligently in real-time?

Another great thing about Spark is the machine learning capability that comes with it. For example in e-commerce, it allows you to personalise web content and triple the number of page views. It also allows you to deliver targeted offers and, as a result, double conversion rates.

So you are not only creating a better customer experience, but driving more revenue – a win-win situation.

One of our customers, the German retailer Otto, is using Spark to predict, with 90% accuracy, which online customers will abandon their shopping carts and then present them with incentive offers. If you are a $12 billion company and have the industry-standard rate of a 50 – 70% abandonment of carts, then even a small improvement can result in extra revenue of millions or even thousands of millions.

These simple design tools make is possible for any size of company – not just those like Otto – to do real-time analytics and deliver an enhanced customer service.

Stop hand coding

Everything I’ve talked about can be programmed in Spark, in Java or in Scala. But there’s a better way. If you are using a visual design interface, you can increase development productivity 10 times or more.

Recently one of our sales engineers told us how a customer had been building an integration job for two months and they were able to help them do it in a single day.

Not only this, but when you are designing jobs with a visual UI, it makes it so much easier to share work with colleagues. People can look at it and understand what the integration job is doing – making collaboration straightforward and the ability to re-use development work simple.

Get a head start

You can start straight away by using a big data sandbox. It’s a virtual machine with Spark pre-loaded and with a real-time streaming use case. And if you need it, there’s a simple guide that walks you through a step by step process, making it easy to pick things up and start running with it. Source