Learning Unhealthy Beverage Demand from Grocery Transaction Data - COMP 396 Undergraduate Research Project

Project title: Using Digital Purchasing Data to Generate Public Health

Evidence: Learning Unhealthy Beverage Demand from Grocery Transaction Data

Project description (50-100 words suggested): Unhealthy diet is the most important preventive cause of mortality and morbidity due to chronic diseases. Taxation of unhealthy food has been proposed to improve population-level dietary patterns, and its effectiveness can be estimated by the prediction of the change in unhealthy food purchasing upon increase of food price. Recent availability of grocery transaction data from scanner technologies enables an accurate prediction of food sales as a function of own-product attributes. However, the very large competing product attributes, typically a few thousand products, in these data prohibits the application of conventional statistical learning algorithms such a Ordinally Least Square (OLS). In this study, we explored the predictive performance of learningalgorithms adapted for high-dimensional data, namely the Least Absolute Shrinkage and Selection Operator (LASSO) and Decision Tree Regressor with Adaptive Boosting (DTR-AdaBoost), in comparison with a conventional statistical learning based on OLS.  LASSO demonstrated superior predictive accuracy to OLS, possibly due to its ability to reduce overfitting and collinearity across predictive features of food sales. DTR-AdaBoost showed the best predictive accuracy, suggesting the presence of extensive non-linearity between the predictive features in the transaction data and sales.

