A patient develops a rare condition and needs answers, so their clinician searches frantically to find patients with similar, rare, symptoms and similar possible causes. To understand the mechanisms of one debilitating disease, a medical researcher tries to separate the “signal” of causes of that disease, in particular, from the “noise” of natural biological variation of human lives and conditions.
Getting the answers those patients and researchers need requires the ability to analyze or query health and genomic data from an enormous number of patients - patients who have their own needs, and deserve to have their data kept at the highest levels of security and privacy.
Today a collaboration of African, Canadian, and EU researchers came together to announce the CINECA (Common Infrastructure for National Cohorts in Europe, Canada, and Africa) project, establishing an unprecedented multi-continental project to build the infrastructure -- data standards, technical protocols, and software -- to allow queries and analyses over the distributed data sets made available by each partner, while allowing those partners complete control over the patient data that they have been entrusted with.
Canada’s health data system has always necessarily been federated, and the experience of the Canadian Distributed Infrastructure for Genomics (CanDIG) with building federated queries and analyses over locally controlled private health data is essential to the project. CanDIG member institutions SickKids and McGill University are directly involved with CINECA, and CanDIG as a whole will bring its experience to bear by leading the work of building standard methods for federating queries, and actively participating in building compatible and interoperable systems for login, access control, and running complex distributed analyses.
“CanDIG is already connecting several important Canadian health data sets in cancer research”, said Guillaume Bourque, Director of the Centre for Computational Genomics at McGill and Co-PI of CanDIG. “As part of this project, we are proposing to connect additional Canadian data sets, and then connect those to an even larger number of data sets internationally. Those new connections between data sets are going to allow Canadian researchers much deeper insight into even that data that they already had access to.”
“The technical goals we have set for ourselves are ambitious”, said Mike Brudno, PI of the CanDIG project and Senior Scientist at SickKids Hospital in Toronto. “But CanDIG has extensive experience working with CINECA partner projects EGA (European Genome/Phenome Archive) and ELIXIR (A European network of life sciences and bioinformatics resources) through their participation as peer Driver Projects for the Global Alliance for Genomics and Health (GA4GH). Building on what our projects have already done alone and together, we’re confident that we can not only meet those goals, but build open-source standards-based solutions for the entire community.”
“Key to this project’s success is trusted, reliable, federated data querying and analysis”, said Steve Jones, Head of Bioinformatics and Co-Director, Michael Smith Genome Sciences Centre, and Co-PI of CanDIG. “We’ve shown how this can be done in support of real science and insight, while retaining control over the data we have been entrusted with; and we’re excited to bring our expertise in data federation to the international community.”
The CINECA project is funded by both the EU through the Horizon 2020 Research and Innovation Programme and the Canadian Government through the Canadian Institutes for Health Research.
CanDIG is a Canadian national health and genomics platform for allowing authorized queries and analysis of data over locally-controlled private data sets. For more information, see https://www.distributedgenomics.ca/
The Canadian Center for Computational Genomics provides bioinformatics analysis and HPC services for the life science research community. For more information, see http://www.computationalgenomics.ca/