Leveraging Big Data in Academic Research

This blog and others have celebrated Big Data for the big insights it can provide – and has already brought – in applications as diverse as finance, marketing, medicine and urban planning. Perhaps the slowest — or most cautious — in big data application has been academia - specifically at academic research institutions whose work contributes to pharma, biotech, drug discovery, genomics, and more.

Some of this is attributed to the cost for academic institutions to acquire software and systems to analyze big data, as well as the complexity of integrating and keeping up with multiple, evolving solutions. One survey found that access to data sets was the No. 1 problem for academic institutions interested in applying big data to research. 

Open Access to Big Data?

Late last year, four leading science organizations called for global accord on a set of guiding principles on open access to big data, warning that limitations would risk progress in a number of areas - from advanced health research to environmental protection and the development of smart cities. The “Open Data in a Big Data World” accord seeks to leverage the data revolution and its scientific potential, advocating that open data would maximize creativity, maintain rigor, and ensure that “knowledge is a global public good rather than just a private good.”

Collaboration around Precision Medicine

Big data is making a bigger impact as genomic centers, hospitals, and other organizations – particularly cancer centers – collaborate around precision medicine. These efforts use genomic data to make treatment decisions even more personalized. Precision medicine both generates and depends on Big Data – it is a source for much of the volume, variety, and velocity of data, and requires analysis to turn it into useful insights for drug discovery, translational research, and therapeutic treatments. 

The Need for Searchable Databases

While the federal investment of $215 billion in the Precision Medicine Initiative is certainly welcome, scientists argue that additional funds and collaborations are needed. One such collaboration, the New York City Clinical Data Research Network (NYC-CDRN), has created “a robust infrastructure where data from more than six million patients is aggregated.” Dr. Mark Rubin of Weill Cornell Medical College argues that searchable databases like the NYC-CDRN are needed so that researchers and clinicians can “better design studies and clinical trials, create personalized treatment plans and inform medical decisions” for optimal, data-informed care.

It appears the marketplace is listening.

Yahoo! announced it was releasing “a massive machine learning dataset” comprised of the search habits of 20 million anonymous users. The dataset is available to academic institutions and researchers only, for context-aware learning, large-scale learning algorithms, user-behavior modeling, and content enrichment. Both Facebook and Google have similarly opened access to artificial intelligence server designs and machine learning libraries, respectively, for academic as well as industry use.

In the face of another federal initiative, the Cancer Moonshot, the Oncology Precision Network has decided to share aggregated cancer genomics data as it aims to “find breakthroughs in cancer care by leveraging previously  untapped real-world cancer genomics data while preserving privacy, security, and data rights.” The Cancer Moonshot itself integrates academia, research, pharmaceutical, insurance, physician, technology, and government organizations to find new immunotherapy-based solutions for cancer care by 2020.

These and other efforts like them let academic researchers cover new ground now that they have access to previously unavailable datasets. In addition, nonprofit and academic institutions are analyzing data from previously conducted research, linking databases, and leveraging machine learning to further accelerate precision medicine development.

A Big Data Guinness World Record

A Guinness World Record ought to be proof of such an impact. The title for “fastest genetic diagnosis” went to Dr. Stephen Kingsmore of Rady Children’s Hospital-San Diego. He used big data and new technologies to successfully diagnose critically ill newborns in 26 hours, nearly halving the previous record of 50 hours. 

This is the kind of breakthrough that researchers – both in industry and academia – are hoping big data can deliver more of. 

Academic Tech Lags Behind Industry, but Big Data Solutions Could Help Close the Gap

But it means overcoming challenges that keep big data technologies out of the hands of academics. Post-docs and grad students want to find at their universities the technologies they will also use in their careers. Academic institutions want to keep costs low while protecting intellectual property. They want access to that IP, even as students graduate, so that future students can benefit from work that has already been done. 

Researchers want to analyze data within context and supported by big data findings. Systems and solutions that let them combine efforts with industry will go a long way toward increasing the discovery and development of highly accurate clinical diagnostic tests and treatments.

What is holding your academic research institution back from deploying big data technologies? Do you need a plan for implementing big data in your organization? Download our Free Trials