Design and analysis of pooled high throughput screening data


Duff Medical Building 3775 rue University, Montreal, QC, H3A 2B4, CA
Dr S. Stanley Young, Asst. Director for Bioinformatics, National Institute of Statistical Sciences, Research Triangle Park, NC. Discovery of biologically active compounds is often accomplished by screening large collections of compounds. Very few compounds are expected to have the activity of interest so testing compounds individually generates large amounts of mostly uninformative data. Dorfman (1943) proposed testing pools of individuals when looking for rare events. There is relatively little literature on using pools for drug discovery; informal conversations indicate that testing of pools has a checkered history. There is a level of paranoia that active compounds will be missed. There is also the somewhat complicated logistics of the decoding process. So what are the statistical issues with pooling? There is some guiding theory for determining the number of compounds in a pool, k=SQRT(1/p) where p is the probability that a compound will be active. We know a lot about the structural features of our compounds. Can we use that knowledge to select compounds to pool together? What about the expensive, retesting decoding process? Can we use the chemical structures of compounds in a pool to "put a statistical finger" on the active compound(s) in an active pool? This presentation addresses two questions: How to use chemical structural information to guide the constructing of pools? How to build a mathematical model for the prediction of biological activity when the tested objects are mixtures of compounds? It is "a work in progress" so I will comment on future research directions. This is very much "team research." I'm working with Prof. Jackie Hughs-Oliver and graduate students at NCSU. Some of this work is funded by the NIH Road Map project.

Contact Information

Mathieu Blanchette
blanchem [at]
Office Phone: