Back

Speaker "Vicky He Wen" Details Back

 

Topic

Large-Scale Multimodal Automated Document Categorization in eCommerce

Abstract

We present MADCAT, a novel framework for large-scale multimodal automated document categorization in eCommerce. Unlike existing techniques for document categorization, our proposed framework integrates signals from multiple modalities such as text and images. First, state-of-the-art classification systems are built for each modality of signals. For text classification, we employ two classifiers in the of traditional Bag-of-Words (BoW) based word representation and recently proposed word vector embedding (Word2Vec) based representation. These systems utilize both product titles as well as product breadcrumbs present on eCommerce product pages or documents. For image classification, 8-layer Convolution Neural Network (CNN) is trained on the primary thumbnail found on each product page. To combine the results from all classifiers, a majority voting based classifier fusion strategy is proposed. To illustrate the efficacy of the proposed framework, we conduct experiments on a large dataset of eCommerce categories spread across 100+ merchants, leading to higher degree of heterogeneity in the form of different styles and qualities of product titles and images. Our experimental results demonstrate superior performance of our system on a large number of categories thereby showcasing good generalization capabilities of our automated categorization system.

Profile

Vicky He Wen is a senior data scientist at Quad Analytix, where she focuses on large-scale production classification, information extraction, and computer vision. She holds a PhD and MS in Electrical Engineering from Stanford University.