Collaborative Data Storage & Security: Critical Needs of the Precision Medicine Data Life Cycle – Part 2

As we discussed in the previous post on precision medicine, the quantity of data being generated in the life sciences is reaching staggering proportions - especially in the field of genomics. This post is the second in our Critical Needs of Precision Medicine Data Life Cycle series. 

Growing Biological Data Analysis Costs 

Although generating raw DNA sequence data has become progressively less expensive, the associated costs of data analysis have continued to grow. Due to the challenges of computational resource availability, cloud computing has become increasingly more important in the development and execution of large scale biological data

Scalability and collaboration are often cited as primary motivations for cloud computing both by commercial and academic scientists. The utility and scalability of the cloud is an attractive option for not only multi-site collaborative research projects, but also for smaller labs lacking adequate computational infrastructure to meet current and future needs. 

Cloud Computing Defined

So, what exactly is Cloud Computing? Gartner Group  - a world-class IT consulting organization - describes it as “a style of computing in which massively scalable IT-related capabilities are provided ‘as a service’ using Internet technologies to multiple external customers.” 

The group also predicts that “By 2020, a corporate ‘no-cloud’ policy will be as rare as a ‘no-internet’ policy is today”.  To put it simply, scientific or business users who want to run complex applications or store very large datasets no longer have to rely on in-house computing infrastructure but can simply rent the services of a cloud services vendor, do their work, get their results, and then release the resources back to the cloud.

50 Years of Cloud Computing

Cloud computing is by no means a new phenomenon. In fact, it traces its roots back over 50 years to the computer clusters of the 1960s in which groups of computers were networked together to function as a single computing entity. 

Computer clusters eventually grew and developed into the internet and led to the rise of grid computing – a form of distributed computing. One key cloud development – MapReduce – was implemented by Google to regenerate their entire index of the web. MapReduce (and open-source adaptations such as Hadoop) allow large datasets to be broken into smaller pieces which can be spread among different computers - a key element in today’s life sciences cloud computing

Amazon Web Services & Life Sciences R&D

Today, Amazon Web Services (AWS) is the market leader for cloud computing both in general, as well as for the life science R&D sector. AWS is currently used to create scalable and highly available IT infrastructures to store, compute and share data. It is also the technology platform used to deliver cloud scale architecture to PerkinElmer Signals for Translational.

Private, Public & Hybrid Cloud Computing

Cloud computing services can be offered as either public or private clouds, or as a hybrid model combining elements of the two. Public clouds allow the user to ‘rent’ hardware and software needed to process or store their data, and release back to the cloud when no longer needed. 

Private clouds are generally preferred by large organizations that cite data security as a primary concern. Private clouds have the advantages of the cloud model, while keeping the infrastructure itself contained behind their own firewall. 

A third model - the hybrid cloud - allows companies to keep key data within their firewall while extending selective activities out to public clouds. 

Cloud Computing Security

Because healthcare data is subject to certain regulations that other industry sectors might not face, both commercial and academic sectors have in the past voiced concerns over data security and integrity in cloud services

However, cloud security has surpassed the security measures at most private data centers - and cloud solutions are well positioned to be turned into data-security aggregators. Given the protections afforded to their mission-critical data and intellectual property, the life sciences industry as a whole is beginning to embrace cloud technology.

For example, Pfizer’s adoption of the Amazon Virtual Private Cloud - which permits a company to extend its firewall and other security measures to the cloud (though at some cost in operating efficiency) - exemplifies this issue. Hybrid cloud solutions are also very popular because they can be deployed to provide the required information for research while maintaining personal or confidential information on a separate system.

The Promise of Cloud Computing

The most promising aspect of the cloud technology “is realizing the pairing of the cloud with big data, analytical tools and mobile devices, especially in healthcare where it can provide around-the-clock monitoring at a fraction of the cost of traditional on-premises tools.” 

Cloud solutions can be leveraged to improve the quality and accessibility of data by allowing data mashups between public and private data sets. This, in turn, enhances the quality and accessibility of life sciences and clinical trial data. 

At PerkinElmer informatics, this is precisely what we are trying to achieve by leveraging the best-in-class in cloud technology and data analytics to help disseminate information faster and more efficiently, in order to provide deep insights into Translational Medicine data. 

For further information, check out this webinar  and stay tuned to find out about other Critical Needs for the Precision Medicine Data Cycle.