Main navigation

R Software Packages

If you are a student or researcher who analyzes genetic and genomic data, or a methodologist developing methods of analysis for such data, please download the software developed by our group. Most methods are implemented as R packages.

Associations in high dimensional data

hidetify - This package proposes functions and algorithm to identify influential observations in high dimensional regression setting
BDcocolasso - R software package to implement high-dimensional error-in-variables regression. This package implements CoCoLasso algorithm in settings with additive error or missing data in the covariates. This package also implements a variation of the CoCoLasso algorithm called Block-Descent CoCoLasso (or BD-CoCoLasso), which focuses on a setting where only a small percentage of the features are corrupted (with additive error or missing data).
CIVMR - Construction of a new instrumental variable that minimizes horizontal pleiotropy in the context of Mendelian randomization | Citation: Jiang et al., Genetic Epidemiology 2019.
ggmix – A mixed model, where the fixed effects can be high dimensional and penalized (L1), and the random effects covariance may be constructed using some of the features also included in among the fixed effects. For example, for simultaneous estimation of SNP fixed effects while adjusting for family relationships using a kinship matrix constructed using overlapping SNPs. See sahirbhatnagar.com
sail - Sparse additive interaction learning. Efficient penalized model for interactions between one key covariate and a high dimensional feature space. Sail enforces a strict hierarchy on the interaction terms. See sahirbhatnagar.com
pcev – Principal components of heritability, a method for dimension reduction of a high dimensional feature space, while maximizing the variance explained by covariates | Citation: Turgeon et al. Statistical Methods in Medical Research 2018
rootWishart – Finding p-values from a double Wishart problem | Citation: Turgeon et al., arxiv
ASGSCA - Provides tools to model and test the association between multiple genotypes and multiple traits, taking into account the prior biological knowledge. The method is based on Generalized Structured Component Analysis (GSCA) | Citation: Romdhani et al. 2015, Genetic Epidemiology
KSPM- R package for kernel semi-parametric models.Manuscript in preparation.

Methods of analysis for DNA Methylation data.

SOMNiBUS - Estimating smooth covariate effects on targeted bisulfite sequencing measures of DNA methylation Manuscript submitted for publication.
DMCHMM - Hidden Markov model for estimating methylation levels and for testing for differentially methylated CpG sites | Citation: Shokoohi et al., 2018 Biometrics
SMSC - A smoothing method for whole genome bisulfite sequencing data that allows for sequencing errors | Citation: Lakhal-Chaieb et al. 2017, Statistical Applications in Genetics and Molecular Biology
funtooNorm – Normalization of Illumina beadchip-derived DNA methylation data when data are from multiple tissues or cell types | Citation: Oros Klein et al. Bioinformatics 2016
funNorm - Functional normalization of 450k methylation array data improves replication in large cancer studies | Citation: Fortin et al. 2014, Genome Biology

Analysis methods for rare genetic variants

GWsignif – A method for estimating genome-wide significance thresholds for extremely dense genetic information, such as obtained from sequencing studies | Citation: Xu et al. 2014, Genetic Epidemiology
MURAT – Multivariate tests of association between rare genetic variants and two or more phenotypes | Citation: Sun et al. European Journal of Human Genetics 2016
RVPedigree – A suite of tools for rare variant analysis including non normal phenotypes and family structures consideration | Citation: Oualkacha et al. 2016, International Journal of Epidemiology
ASKAT – Now integrated into RVPedigree | Citation: Oualkacha et al. 2013, Genetic Epidemiology
RVTests – Tests for association with rare genetic variants | Citation: Xu et al. 2012, PLoS One

Scripts

imputePrepSanger – A script to assist in preparing files for imputation using the Sanger imputation service. This repository contains scripts to prepare plink genotype files for imputation on the Sanger server.
methylation450KPipeline - Functions to run a 450K pipeline analysis.
clusterProfiler – Statistical analysis and visualization of functional profiles for genes and gene clusters.
CellTypeAdjustment – Scripts for performing cell type mixture adjustments in DNA methylation data | Citation: McGregor et al. 2016, Genome Biology
pcev_pipelineCBRAIN - A pipeline to run a pcev analysis from the R package on CBRAIN.

Microbiome Data

MDiNE - allows the estimation of microbiome OTU co-occurrence networks within two separate groups, where the networks are defined through precision matrices. The difference between the two precision matrices is also estimated, along with corresponding interval estimates.Manuscript submitted for publication.

Tutorials on various useful tools in analysis and research

Introduction to Genomic Ranges R objects Presentation by Greg Voisin
Introduction to Genomic Ranges R objects Vignette by Greg Voisin
Reproducible Research: An Introduction to knitr Presentations by Sahir Bhatnagar
Reproducible Research: An Introduction to knitr examples by Sahir Bhatnagar

For more information, visit the R project website at:

http://www.r-project.org and https://github.com/GreenwoodLab

Back to top