Event

Combining Multiple Imputation and the Knockoff Filter for Variable Selection, with an Application to Large-Scale Assessment Data

Wednesday, August 27, 2025 15:30to16:30

Leonardo Grilli, PhD

Professor - University of Florence

 

**This talk concerns research financed by the
Next Generation EU Project Age-It (Ageing Well in an Ageing Society)** 

Note: Meet & Greet Prof Grilli from 3-3:30pm in Room 1140; Prior to seminar 3:30-4:30pm

WHEN: Wednesday, August 27, 2025, from 3:30 to 4:30 p.m.
WHERE: Hybrid | 2001 McGill College Avenue, Rm 1140; Zoom
NOTE: Leonardo Grilli will be presenting in-person

Abstract

Large-scale assessment data, such as those collected in Italy by Invalsi, typically include several student background variables, which can be exploited as predictors in modelling student achievement. Unfortunately, the student background variables are usually affected by missing values, posing serious challenges to the model selection procedures. As a further complication, many of the predictors are variables with unordered categories. This paper proposes combining multiple imputation and variable selection methods in a setting with categorical predictors. In particular, we implement multiple imputation by chained equations (MICE). At the same time, for variable selection, we exploit a recently proposed method based on the knockoff filter, where the knockoff copies are generated using a sequential procedure that properly handles both continuous and categorical predictors. A simulation study shows that the proposed approach performs well, also in comparison with other knockoff-based approaches and the classical lasso. In the application to the Invalsi test data, once the student background variables have been selected, we fit a random intercept model to analyse the determinants of the math score at grade 5. The proposed approach is computationally feasible and highly flexible.


Speaker Bio

Leonardo Grilli is a Full Professor of Statistics at the University of Florence. He earned a PhD in Applied Statistics from the University of Florence in 2000. He has been the Director of the Master's program in Statistics and Data Science. Currently, he is a member of the board of the PhD Program in Development Economics and Local Systems and an elected member of the steering committee of the Italian Statistical Society. The teaching activity focuses on introductory statistics and statistical modelling, including generalized linear models and multilevel models. The research activity mainly concerns random effects models for multilevel analysis, with methodological advances about the specification and estimation of models in complex frameworks such as multivariate responses, informative sampling designs, and sample selection bias. He also made contributions in causal inference, IRT models, latent growth curve models, mixture models, and quantile regression. The methodological work is driven by applications in the social sciences and medicine.

Back to top