Challenges and opportunities for biomarker discovery when applying machine learning techniques to large RNA-Seq cohorts
Sebastien Lemieux (University of Montreal)
Tuesday September 29, 12-1pm
Zoom Link: https://mcgill.zoom.us/j/91589192037
Abstract: Over two decades of statistical developments have allowed transcriptomics, from microarrays to RNA-Seq, to become an indispensable tool to characterize changes in expression profiles. Computationally, a central step is the application of specialized parametric statistical tests such as DESeq2, EdgeR or Voom to single out differentially expressed genes. In parallel, several new tricks have been developed within the machine learning framework to facilitate the training of high-dimensional classifiers that can take advantage of the whole transcriptome characterizations. Unfortunately, identifying biomarkers from these trained classifiers has proven more difficult than expected. These computational advances, coupled with reduced costs and protocol stabilization for RNA-Seq has led to the emergence of large cohorts of hundreds of high-quality RNA-Seq expression profiles. I will show, using large datasets developed to refine sub-typing of acute myeloid leukemia (AML), that standard statistical tools reveal their inadequacy for the identification of biomarkers. I will present a novel approach to the identification of biomarkers based on machine learning principles that scales well with large datasets.