Global Big Data Conference

Industry News Details

Machine Learning: Why it Matters? Posted on : Feb 19 - 2017

Are you into Machine Learning OR are you “just” a Statistician? Have you been asked this question yet? If you are in a career or looking to get into one that has anything to do with deriving insights out of data, you probably know what I am talking about.

The year 2016 has seen over three dozen machine learning startups being acquired by tech giants; another several dozen machine learning startups raked up a aggregate funding to the tune of $4 Billion worldwide. Is it a blip or a bubble? Definitely not. In times when automation is key, it was but imperative that we figure out methods of data analysis & model building that automates data analysis & model building. Sounds tautological? It is. And in a way that is what machine learning is … err … rather does. It picks up right where traditional statistical models stop. It’s all about building algorithms that learn iteratively from data. The more data you feed it, the better results it churns out.

While conceptually machine learning has been around for more than 80 years (recent history dates it back to World War II and Turing), the recent frenzy around it can be attributed to the overall advances and affordability in computing power. While manually getting these models to improve themselves through numerous iterations may seem tedious, if not impossible, a modern computer fed with the algorithm can get these models to learn, grow, change, and develop by themselves in a matter of seconds … and we are already talking “real-time!” What more, they can look for insights without being told exactly where to look for insights a.k.a. dealing with unstructured data (think social media, web-searches). It iterates, learns new stuff, and adapts, and iterates and continues the whole process all over again learning from new data every time. It really embodies the adage that practice makes perfect.

Now if you put this in the context of the self-driving cars, or the recommender engines in Netflix or Amazon – you can see why such algorithms that generate decisions out of data real time without human intervention, would be key to where we are headed both in terms of technology and user experience. It is machine learning that has turned the “hype” around the importance of “big data” into a reality. When availability of more data could have caused concerns around it’s usability for deriving meaningful insights, it was machine learning that came to the rescue. Let’s just say that compared to traditional statistical methods which dealt with static models, machine learning is more in tune with the current times and it’s needs.

The discussion becomes a bit more exciting and a little more tangible when we start considering some problems where machine learning is a clear improvement over traditional statistical methods (although a strong caveat here would be … a lot of machine learning techniques are really enhancements or extensions of their “statistical” counterparts).

Let’s start with Pattern Recognition: it is the essence of solving a lot of business problems that rely on regularities in the gathered data to make predictions. It is also what is called “supervised learning” in machine learning parlance. While a traditional classification and regression model can give you a prediction, it is a “closed-form” solution which of course is static. A machine learning technique called gradient boosting takes the same approach but iteratively and continuously searches for a “local minimum” and adapts as it learns. Ever had your credit card declined when using it a gas station that’s not your “usual” one? In most cases that is supervised (or semi-supervised) learning models at work for you.

But given supervised learning deals prominently with “historical data” or a “training set,” it has to live with it’s own limitations of being unable to predict in situations where you have no past data to predict the future. While the traditional k-means or hierarchical clustering can in theory be applied, when you are dealing with a deluge of transactional data, you need to apply unsupervised learning techniques like ANN or GMM to explore the surpassed data and find structure within it. So the next time you see the “Also recommended for you” while shopping on Amazon.com, know that every new item you searched, every new item you saved, and every new item you bought, were factored into those recommendations by those unsupervised machine learning algorithms within a matter of a few seconds.

If you are dealing with discrete outcome variables, logistic regression techniques naturally come to mind, but fall short when it comes to dealing with complex data sets – and hence the need for looking into Support Vector Regressions or Hierarchical Bayesian models. If you are looking at a large but well-behaved data set the good ol’ logit will still work great; but with the big bad ugly ones you probably have to resort to the much talked about Bagged Regression technique, which by the way is nothing more than a thousand logits estimated through random sample draws from the mother data set, which are then averaged to minimize bias and variance. View More

Get the