Integrating experimental and clinical data from proprietary
private and public databases to accelerate research discoveries should be
commonplace. During his 2016 State of the Union Address, President Obama took a
crucial first step in making this a reality in the battle against cancer, calling on Vice President Biden to lead a
national Moonshot initiative to accelerate research in this field.
In June, the vice president took action, announcing a first-of-its kind,
open-access cancer database. Dubbed the Genomic Data Commons (GDC), the new database is intended to provide the cancer
research community with a unified repository to share information across cancer
genomic studies in support of precision medicine.
What is the GDC?
The GDC contains the raw genomic and clinical data
for 12,000 patients. In the future, Vice President Biden envisions that the open
database will include detailed analyses from the research community of the
molecular makeup of cancers, information on which treatments were used for
specific types of cancer and how patients respond to those treatments. In
addition, the GDC will consolidate the National Cancer Institute’s diverse
datasets of genomic sequences and analyses of tumors. The main objective of all
of this is to consolidate the data for
open consumption to speed research, and eliminating time spent by
researchers on similar projects that have already been conducted by others.
Historically, public data sets like The Cancer Genome Atlas (TCGA) and the 1000 Genome Projects have helped the scientific community
advance their understanding of complex diseases like cancer. In our view, the
GDC is the latest step in advancing the scientific community’s knowledge in
this area, and is a positive step in the right direction.
That said, there can be challenges with systems
like GDC. These include:
Availability: The availability of data and the
willingness and ability of researchers to share their data. Some companies may
not want to share proprietary information about their genomic trials.
Consent and legal issues: In many cases,
patient consent may not allow for the publication of this data.
Scope: While genomics is an important piece of
translational medicine, there are many other profiling technologies not
supported by the GDC.
Access controls: The GDC is designed to be an open
platform and has little focus on restrictions. While it makes sense for sharing
public data, access controls are an important and difficult part of a
commercial solution dealing with clinical data.
We also see a future where additional collaboration
could occur with complementary systems offered by private companies to overcome
the challenges listed above. For example, our
cloud-based data management,
aggregation and analysis platform for pharmaceutical researchers, Signals for
Translational, helps integrate experimental and clinical
research data from many sources and assay platforms. This not only includes genomic
data, but also proteomics, metabolomics and imaging. A platform like Signals can
integrate proprietary in-house data that researchers may not be willing to
share with the public data available from the GDC.
Seamlessly integrating data visualization, exploration and analytics capabilities
with data management enables a highly interactive hypothesis-driven analysis
workflow. With these capabilities, researchers can easily complete orders of
magnitude more efficiently than with the traditional workflow. We must be able to integrate experimental and
clinical data from existing proprietary, private and public databases to
research in order to support greater collaboration and help scientists to
increase the speed and efficiency of developing targeted drugs - not only for
cancer but across all therapeutic areas of translational research.