If you are a student or researcher who analyzes genetic and genomic data, or a methodologist developing methods of analysis for such data, please download the software developed by our group. Most methods are implemented as R packages.
(Clicking the link will direct you to an external site)
Associations in high dimensional data.
- hidetify - This package proposes functions and algorithm to identify influential observations in high dimensional regression setting
- BDcocolasso - R software package to implement high-dimensional error-in-variables regression. This package implements CoCoLasso algorithm in settings with additive error or missing data in the covariates. This package also implements a variation of the CoCoLasso algorithm called Block-Descent CoCoLasso (or BD-CoCoLasso), which focuses on a setting where only a small percentage of the features are corrupted (with additive error or missing data).
- CIVMR - Construction of a new instrumental variable that minimizes horizontal pleiotropy in the context of Mendelian randomization | Citation: Jiang et al., Genetic Epidemiology 2019.
- ggmix – A mixed model, where the fixed effects can be high dimensional and penalized (L1), and the random effects covariance may be constructed using some of the features also included in among the fixed effects. For example, for simultaneous estimation of SNP fixed effects while adjusting for family relationships using a kinship matrix constructed using overlapping SNPs. See sahirbhatnagar.com
- sail - Sparse additive interaction learning. Efficient penalized model for interactions between one key covariate and a high dimensional feature space. Sail enforces a strict hierarchy on the interaction terms. See sahirbhatnagar.com
- pcev – Principal components of heritability, a method for dimension reduction of a high dimensional feature space, while maximizing the variance explained by covariates | Citation: Turgeon et al. Statistical Methods in Medical Research 2018
- rootWishart – Finding p-values from a double Wishart problem | Citation: Turgeon et al., arxiv
- ASGSCA - Provides tools to model and test the association between multiple genotypes and multiple traits, taking into account the prior biological knowledge. The method is based on Generalized Structured Component Analysis (GSCA) | Citation: Romdhani et al. 2015, Genetic Epidemiology
- KSPM- R package for kernel semi-parametric models.Manuscript in preparation.
Methods of analysis for DNA Methylation data.
- SOMNiBUS - Estimating smooth covariate effects on targeted bisulfite sequencing measures of DNA methylation Manuscript submitted for publication.
- DMCHMM - Hidden Markov model for estimating methylation levels and for testing for differentially methylated CpG sites | Citation: Shokoohi et al., 2018 Biometrics
- SMSC - A smoothing method for whole genome bisulfite sequencing data that allows for sequencing errors | Citation: Lakhal-Chaieb et al. 2017, Statistical Applications in Genetics and Molecular Biology
- funtooNorm – Normalization of Illumina beadchip-derived DNA methylation data when data are from multiple tissues or cell types | Citation: Oros Klein et al. Bioinformatics 2016
- funNorm - Functional normalization of 450k methylation array data improves replication in large cancer studies | Citation: Fortin et al. 2014, Genome Biology
Analysis methods for rare genetic variants.
- GWsignif – A method for estimating genome-wide significance thresholds for extremely dense genetic information, such as obtained from sequencing studies | Citation: Xu et al. 2014, Genetic Epidemiology
- MURAT – Multivariate tests of association between rare genetic variants and two or more phenotypes | Citation: Sun et al. European Journal of Human Genetics 2016
- RVPedigree – A suite of tools for rare variant analysis including non normal phenotypes and family structures consideration | Citation: Oualkacha et al. 2016, International Journal of Epidemiology
- ASKAT – Now integrated into RVPedigree | Citation: Oualkacha et al. 2013, Genetic Epidemiology
- RVTests – Tests for association with rare genetic variants | Citation: Xu et al. 2012, PLoS One
Scripts:
- imputePrepSanger – A script to assist in preparing files for imputation using the Sanger imputation service. This repository contains scripts to prepare plink genotype files for imputation on the Sanger server.
- methylation450KPipeline - Functions to run a 450K pipeline analysis.
- clusterProfiler – Statistical analysis and visualization of functional profiles for genes and gene clusters.
- CellTypeAdjustment – Scripts for performing cell type mixture adjustments in DNA methylation data | Citation: McGregor et al. 2016, Genome Biology
- pcev_pipelineCBRAIN - A pipeline to run a pcev analysis from the R package on CBRAIN.
Microbiome Data:
- MDiNE - allows the estimation of microbiome OTU co-occurrence networks within two separate groups, where the networks are defined through precision matrices. The difference between the two precision matrices is also estimated, along with corresponding interval estimates.Manuscript submitted for publication.
Tutorials on various useful tools in analysis and research.
- Introduction to Genomic Ranges R objects Presentation by Greg Voisin
- Introduction to Genomic Ranges R objects Vignette by Greg Voisin
- Reproducible Research: An Introduction to knitr Presentations by Sahir Bhatnagar
- Reproducible Research: An Introduction to knitr examples by Sahir Bhatnagar
For more information, visit the R project website at:
http://www.r-project.org and https://github.com/GreenwoodLab