Back

Speaker "Liangjie Hong" Details Back

 

Topic

GB-CENT: Gradient Boosted Categorical Embedding and Numerical Trees

Abstract

Latent factor models and decision tree based models are widely used in tasks of prediction, ranking and recommendation. Latent factor models have the advantage of interpreting categorical features by a low-dimensional representation, while such an interpretation does not naturally t numerical features. In contrast, decision tree based models enjoy the advantage of capturing the nonlinear interactions of numerical features, while their capability of handling categorical features is limited by the cardinality of those features. Since in real-world applications we usually have both abundant numerical features and categorical features with large cardinality (e.g. geolocations, IDs, tags etc.), we design a new model, called GB-CENT, which leverages latent factor embedding and tree components to achieve the merits of both while avoiding their demerits. With two real-world data sets, we demonstrate that GB-CENT can effectively (i.e. fast and accurate) achieve better accuracy than state-of-the-art matrix factorization, decision tree based models and their ensemble.

Profile

I am Head of Data Science at Etsy Inc., managing a group of data scientists to deliver cutting-edge scientific solutions for: * Search and Discovery * Personalization and Recommendation * Computational Advertising * Deep Learning and Image Understanding Previously, I was Senior Manager of Research at Yahoo Research from 2013 to 2016, leading science efforts for Personalization and Search Sciences. Our team helped to drive science solutions for: Yahoo Homepage News Streams, Yahoo Aviate App Recommendation, Yahoo Tumblr Blog Recommendation, Yahoo Video Recommendation, Yahoo Assistant/Bot Platform and Yahoo Mobile Search. Over the past several years, I have published papers in all major international conferences in data mining, machine learning and information retrieval, such as SIGIR, WWW, KDD, CIKM, AAAI, WSDM, RecSys and ICML with more than 1,700 citations (H-index: 17), with WWW 2011 Best Poster Paper Award, WSDM 2013 Best Paper Nominated and RecSys 2014 Best Paper Award. I have served as program committee members in KDD, WWW, SIGIR, WSDM, AAAI, EMNLP, ICWSM, ACL, CIKM, IJCAI and several workshops. I have helped to review articles in top journals. I have co-founded User Engagement Optimization Workshop, which has been held in conjunction with CIKM 2013 and with KDD 2014.