Calling the world's wealth of health data a formidable "engine of discovery," the National Institutes of Health on Thursday awarded $32 million in grants in a bid to make huge biomedical data sets accessible to researchers the world over.
Among the challenges to be worked out under the initiative is how researchers can share data gleaned from electronic medical records without compromising the privacy of patients.
Project descriptions show that researchers at UC Santa Cruz will develop novel ways to comb through mountains of genomic data to discover cancer-causing genes and drugs that will effectively target them. Stanford University's Center for Expanded Data Annotation and Retrieval will attempt to render a huge repository of infectious disease data into digital forms that will allow researchers to compare dissimilar studies and detect patterns in their findings.
Under the umbrella of a project called the ENIGMA Center for Medicine, Imaging and Genomics, USC neuroscientist Paul Thompson will gather 307 scientists in 33 countries, along with their biomedical data sets. The group will comb through epidemiological and genomic data and brain scans of all sorts to hunt for the causes of -- and prospective treatments for -- a range of brain diseases, including autism, depression, multiple sclerosis, Parkinson's and schizophrenia.
Studies that use such data generate volumes of information that are simply too big to be transferred electronically, Thompson said. To review the whole-genome sequencing performed on 815 subjects with Alzheimer's disease recently, the researchers said they watched delivery trucks disgorge boxes and boxes of disc drives for weeks at a time.
"That's not a scalable solution" Thompson said. If the enigmas of mental illness and degenerative disease are to be cracked, Toga added, "there have to be solutions" that allow scientists across the globe to collaborate on such research.
Similar challenges hamper efforts to glean additional insight from large disease-oriented research projects such as the Cancer Genome Atlas, which examines the genomic underpinnings of more than 30 types of cancer, and the ENCODE Project, which seeks to identify all functional elements in the human genome.
"The potential of these data, when used effectively, is quite astounding," Collins said. As the BD2K initiative generates tools to allow the widespread dissemination and analysis of such data, "the whole will be greater than the sum of its parts," he said.