If you are a student or researcher who analyzes genetic and genomic data, or a methodologist developing methods of analysis for such data, download R functions developed by our group.
(Clicking the link will direct you to an external site)
Rare variant analyses. These methods have been developed for analysis of DNA sequencing data. Specifically, these methods test for association between small regions of the genome containing multiple genetic variants and quantitative phenotypes.
- RVTests | Citation: Xu et al. 2012, PLoS One
- GWsignif | Citation: Xu et al. 2014, Genetic Epidemiology
- ASKAT (package under development) | Citation: Oualkacha et al. 2013, Genetic Epidemiology
- RVPedigree (package under development) | Citation: Lakhal-Chaieb et al. 2015, Statistic in Medicine
- Burkett & Greenwood 2013. A sequence of methodological changes due to sequencing. Curr Opin Allergy Clin Immunol, 13(5):470-477.
- E Zeggini & A Morris, eds. 2015. Assessing rare variation in complex traits: Design and analysis of genetic studies. Springer.
Methylation data. The first step in analysis of DNA methylation data from the Illumina 450K array is appropriate normalization.
- funNorm | Citation: Fortin et al. 2014, Genome Biology
- funtooNorm | Citation: Klein et al. 2015, Bioinformatic
- DMCHMM | Citation: Shokoohi et al., submitted 2017
- SMSC | Citation: Lakhal-Chaieb et al. 2017, Statistical Applications in Genetics and Molecular Biology
KD Siegmund 2011. Statistical approaches for the analysis of DNA methylation microarray data. Hum Genet 129: 585-595.
KB Michels et al. 2013. Recommendations for the design and analysis of epigenome-wide association studies. Nature Methods 10(10): 949-55.
Associations in high dimensional data.
The ASGSCA (Association Study using GSCA) package provides tools to model and test the associations between multiple genotypes and multiple traits, taking into account prior biological knowledge. Functional genomic regions, e.g., genes and clinical pathways, are incorporated in the model as latent variables that are not directly observed. See Romdhani et al. 2015 for details. The method is based on Generalized Structured Component Analysis (GSCA) (Hwang & Takane 2004). GSCA is an approach to structural equation models (SEM) and thus constitutes two sub-models: measurement and structural models. The former specifies the relationships between observed variables (here genotypes and traits) and latent variables (here genes or more generally genomic regions and clinical pathways), whereas the structural model expresses the relationships between latent variables.
PCEV is a dimension-reduction technique, similar to Principal components Analysis (PCA), which seeks to maximize the proportion of variance (in the response vector) being explained by a set of covariates. The R package implements two estimation methods: the classical approach and a block approach, proposed by Turgeon et al. 2016, which is suitable for high-dimensional response vectors. The package also performs inference using both analytic and permutation tests.
- H Hwang & Y Takane 2004. Generalized structured component analysis. Psychometrika, 69:81-99.
- J de Leeuw et al. 1976. Additive structure in qualitative data: An alternating least squares method with optimal scaling features. Psychometrika, 41:471-503.
- Klei et al. 2008. Pleiotropy and principal components of heritability combine to increase power for association analysis. Genet Epidemiol, 32(1):9-19.
Tutorials on various useful tools in analysis and research.
- Introduction to Genomic Ranges R objects Presentation by Greg Voisin
- Introduction to Genomic Ranges R objects Vignette by Greg Voisin
- Reproducible Research: An Introduction to knitr Presentations by Sahir Bhatnagar
- Reproducible Research: An Introduction to knitr examples by Sahir Bhatnagar
For more information, visit the R project website at: http://www.r-project.org