Event

Bhramar Mukherjee, PhD, University of Michigan

Tuesday, October 24, 2017 15:30to16:30
Purvis Hall Room 24, 1020 avenue des Pins Ouest, Montreal, QC, H3A 1A2, CA

How Small Data Can Leverage Big Data

Bhramar Mukherjee is a John D. Kalbfleisch Collegiate Professor and Associate Chair of the Department of Biostatistics; Professor, Department of Epidemiology, Professor of Global Public Health, University of Michigan School of Public Health. She is a Research Professor in Michigan Institute of Data Science. She is the Associate Director for Cancer Control and Population Sciences at the University of Michigan Comprehensive Cancer Center. Her research interests include Statistical methods for studies of gene-environment interaction, case-control studies and outcome dependent sampling, Bayesian methods, shrinkage estimation, optimal designs, applications in cancer, cardiovascular diseases, exposure science and environmental epidemiology. She has co-authored more than 180 peer-reviewed papers. She is a fellow of the American Statistical Association and has received many awards among which the Gertrude Cox Award in 2016 from the Washington Statistical Society, the John D. Kalbfleisch Collegiate Professorship from the University of Michigan in 2015, the University of Michigan mid-career Faculty Recognition Award in 2015 and the Outstanding young researcher award (applications category), International Indian Statistical Association in 2014.
We are living at a time when the “Big Data” movement is raging across the world, revolutionizing and stretching our computational imagination, when being a data scientist is perhaps more attractive than being a statistician to the new generation of quantitative scientists. This lecture will aim to illustrate how classical statistical principles can be used to incorporate external auxiliary information available from large data sources in improving inference based on a current dataset of modest size. We will consider three examples from biomedical sciences. (1) A new assaying technology is replacing the current practice: we have a large dataset measured in the old platform and a small sub-sample measured on the new one; can the old one help in boosting prediction of patient outcomes? (2) A new biomarker/predictor is being proposed to be added to an existing prediction model: while we have abundant published data on the established prediction model, the new biomarker is available on a smaller sample; can the existing information be used in a principled way to improve prediction under the new model? (3) We have a convenience sample of patients in a health system with access to their complete electronic medical records and genomewide scans: can we use knowledge from large population-based genome-wide association studies to learn and discover in this biased sample? Through these three examples, I will try to identify a connecting theme advocating for timeless statistical principles and study designs to be applied to cutting-edge problems in biomedical sciences.
Back to top