R Software Packages

If you are a student or researcher who analyzes genetic and genomic data, or a methodologist developing methods of analysis for such data, please download the software developed by our group. Most methods are implemented as R packages.

(Clicking the link will direct you to an external site)


Associations in high dimensional data.

  • hidetifyThis package proposes functions and algorithm to identify influential observations in high dimensional regression setting
  • BDcocolasso - R software package to implement high-dimensional error-in-variables regression. This package implements CoCoLasso algorithm in settings with additive error or missing data in the covariates. This package also implements a variation of the CoCoLasso algorithm called Block-Descent CoCoLasso (or BD-CoCoLasso), which focuses on a setting where only a small percentage of the features are corrupted (with additive error or missing data).
  • CIVMR - Construction of a new instrumental variable that minimizes horizontal pleiotropy in the context of Mendelian randomization | Citation: Jiang et al., Genetic Epidemiology 2019.
  • ggmix – A mixed model, where the fixed effects can be high dimensional and penalized (L1), and the random effects covariance may be constructed using some of the features also included in among the fixed effects. For example, for simultaneous estimation of SNP fixed effects while adjusting for family relationships using a kinship matrix constructed using overlapping SNPs. See sahirbhatnagar.com
  • sail - Sparse additive interaction learning. Efficient penalized model for interactions between one key covariate and a high dimensional feature space. Sail enforces a strict hierarchy on the interaction terms. See sahirbhatnagar.com
  • pcev – Principal components of heritability, a method for dimension reduction of a high dimensional feature space, while maximizing the variance explained by covariates | Citation: Turgeon et al. Statistical Methods in Medical Research 2018
  • rootWishart – Finding p-values from a double Wishart problem | Citation: Turgeon et al., arxiv
  • ASGSCA - Provides tools to model and test the association between multiple genotypes and multiple traits, taking into account the prior biological knowledge. The method is based on Generalized Structured Component Analysis (GSCA) | Citation: Romdhani et al. 2015, Genetic Epidemiology
  • KSPM- R package for kernel semi-parametric models.Manuscript in preparation.


Methods of analysis for DNA Methylation data.


Analysis methods for rare genetic variants.




Microbiome Data:

  • MDiNE - allows the estimation of microbiome OTU co-occurrence networks within two separate groups, where the networks are defined through precision matrices. The difference between the two precision matrices is also estimated, along with corresponding interval estimates.Manuscript submitted for publication.


Tutorials on various useful tools in analysis and research.

  1. Introduction to Genomic Ranges R objects Presentation by Greg Voisin
  2. Introduction to Genomic Ranges R objects Vignette by Greg Voisin
  3. Reproducible Research: An Introduction to knitr Presentations by Sahir Bhatnagar
  4. Reproducible Research: An Introduction to knitr examples by Sahir Bhatnagar


For more information, visit the R project website at:

http://www.r-project.org and https://github.com/GreenwoodLab


Back to top