Big data, meet big money: NIH funds centers to crunch health data


Calling the world’s wealth of health data a formidable “engine of discovery,” the National Institutes of Health on Thursday awarded $32 million in grants in a bid to make huge biomedical data sets accessible to researchers the world over.

NIH Director Dr. Francis Collins said the Big Data to Knowledge, or BD2K, initiative expects to invest $656 million over the next seven years to collect, analyze, catalog and disseminate research findings, genomic analyses, imaging scans and electronic health records. Made available broadly, that mass of data would allow researchers to glean new insights to improve health, Collins said.

Among the challenges to be worked out under the initiative is how researchers can share data gleaned from electronic medical records without compromising the privacy of patients.


Of the 12 “centers of excellence” to be established under the BD2K initiative, four California institutions -- UCLA, USC, UC Santa Cruz and Stanford University -- will be tapped to play a major role. Collectively, the four universities are to be awarded $7 million in 2014 and are slated to receive close to $38 million over the next four years.

Project descriptions show that researchers at UC Santa Cruz will develop novel ways to comb through mountains of genomic data to discover cancer-causing genes and drugs that will effectively target them. Stanford University’s Center for Expanded Data Annotation and Retrieval will attempt to render a huge repository of infectious disease data into digital forms that will allow researchers to compare dissimilar studies and detect patterns in their findings.

At UCLA, researchers at the Geffen School of Medicine and the Samueli School of Engineering and Applied Sciences will work with five other institutes -- including Scripps Research Institute -- under an initial $1.5-million grant. The researchers will start by mining the cloud to generate data on protein markers for cardiovascular disease. That will be the test bed for a broader effort -- worth $11 million over the next four years -- to develop tools for accessing, standardizing and sharing biomedical data.

The BD2K initiative will establish two centers of excellence at USC, expected to bring in a collective $23 million over the next four years and $1.75 million in the fiscal year that began Oct. 1. Both centers will leverage the university’s unrivaled collection of brain imagery, samples and other data to make advances in neuroscience as well as bioinformatics.

Under the umbrella of a project called the ENIGMA Center for Medicine, Imaging and Genomics, USC neuroscientist Paul Thompson will gather 307 scientists in 33 countries, along with their biomedical data sets. The group will comb through epidemiological and genomic data and brain scans of all sorts to hunt for the causes of -- and prospective treatments for -- a range of brain diseases, including autism, depression, multiple sclerosis, Parkinson’s and schizophrenia.

Using cell and brain samples collected from about 30,000 patients across the globe, a team led by USC neuroscientist Arthur Toga will develop data management strategies, computational methodologies and software tools to store, analyze and visualize such a vast trove of information.

In an interview, the two underscored that the challenge of storing and making sense of the accumulation of data collected by neuroscientists has become an obstacle to advancements in the field.

Studies that use such data generate volumes of information that are simply too big to be transferred electronically, Thompson said. To review the whole-genome sequencing performed on 815 subjects with Alzheimer’s disease recently, the researchers said they watched delivery trucks disgorge boxes and boxes of disc drives for weeks at a time.

“That’s not a scalable solution” Thompson said. If the enigmas of mental illness and degenerative disease are to be cracked, Toga added, “there have to be solutions” that allow scientists across the globe to collaborate on such research.

Similar challenges hamper efforts to glean additional insight from large disease-oriented research projects such as the Cancer Genome Atlas, which examines the genomic underpinnings of more than 30 types of cancer, and the ENCODE Project, which seeks to identify all functional elements in the human genome.

“The potential of these data, when used effectively, is quite astounding,” Collins said. As the BD2K initiative generates tools to allow the widespread dissemination and analysis of such data, “the whole will be greater than the sum of its parts,” he said.

Follow me on Twitter at @LATMelissaHealy and “like” Los Angeles Times Science & Health on Facebook.