Back

 Industry News Details

 
Advanced Machine Learning with Basic Excel Posted on : Apr 27 - 2017

1. Excel template for general machine learning

In short, we offer here an Excel template for machine learning and statistical computing, and it is quite powerful for an Excel spreadsheet. The techniques have been used by the author in automated data science frameworks (AI to automate content production, selection and scheduling for digital publishers) but also in the following contexts:

spam detection,

click, website, and keyword scoring (assigning a commercial value to a keyword, group of keywords, or content category)

credit card fraud detection,

Botnet detection, and predicting blog popularity.

The technique blends multiple algorithms that at first glance look traditional and math-heavy, such as decision trees, regression (logistic or linear) and confidence intervals. But they are radically different, can fit in a small spreadsheet (though the Python version is more powerful, flexible, and efficient), and do not involve math beyond high-school level. In particular, no matrix algebra is required to understand the methodology.

The methodology presented here is the result of 20 years worth of applied research on various large industrial data sets, where the author  tried for years (eventually with success) to build a system that is simple and work. Most everyone else believed or made people believe that only complex system work, and have spent their time complexifying algorithms rather than simplifying them (partly for job security purposes.)

Who should use the spreadsheet?

First, the spreadsheet (as well as the Python, R, Perl or Julia version) are free to use and modify, even for commercial, purposes, or to make a product out of it and sell it. It is part of my concept of open patent, in which I share all my intellectual property publicly and for free.

The spreadsheet is designed as a tutorial, though it processes the same data set as the one used for the Python version. It is aimed at people that are not professional coders, people who manage data scientists, BI experts, MBA professionals, and people from other fields, with an interest in understanding the mechanics of some state-of-the-art machine learning techniques, without having to spend months or years learning mathematics, programming, and computer science. A few hours is needed to understand the details. This spreadsheet can be the first step to help you transition to a new, more analytical career path, or to better understand the data scientists that you manage or interact with. Or to spark a career in data science. Or even to teach machine learning concepts to high school students. View More