Machine Learning


Are You Ready for Machine Learning?

Some users are so enthusiastic about Machine Learning (ML) that Gartner added it for the first time to its Hype Cycle for Emerging Technologies in 2016 based on its ability to revolutionize manufacturing and related industries. Forrester Research says enterprises are “enthralled” by the potential of ML “to uncover actionable new knowledge, predict customers’ wants and behaviors, and make smarter business decisions.” 

It’s not just for manufacturing or retail, however. Computers that can “learn” have applications in the life sciences as well. Predictive modeling can, for example, assist with diagnostics - whether from clinical or molecular data, or from image recognition.  A study at Stanford University showed that ML more accurately predicted outcomes for lung cancer patients than did pathologists.

What is Machine Learning? 

Gartner defines it as “a technical discipline that provides computers with the ability to learn from data (observations) without being explicitly programmed. It facilitates the extraction of knowledge from data. Machine learning excels in solving complex, data-rich business problems where traditional approaches, such as human judgment and software engineering, increasingly fail.”

According to Tom Mitchell, author of Machine Learning (which in 1997 was one of the first textbooks on the subject), ML can be defined as any computer program that improves its performance (P) at some task (T) through experience (E). 

It works, in part, by finding patterns in data. It has typically been the domain of proficient data scientists, software engineers, IT professionals, and others who understand deep learning, neural networks and similar specialized subjects. In short, ML has felt off-limits to many subject matter experts – think of those pathologists in the Stanford study – who could benefit from ML’s algorithms.

“The analytics revolution becomes real when the average person becomes more comfortable with advanced analytics,” says Dr. Jamie Powers, DrPH, Director of Real World Evidence and Data Science at PerkinElmer Informatics. “Not that it’s simple or easy, but the barrier to entry isn’t as high as one might think.”

And getting on board is important, says Dr. Powers, if organizations don’t want to be left behind.

“The insights that can be gained from statistical analysis and machine learning are far greater than what a pair of human eyes can possibly offer,” he says. “What machine learning can do is direct where the humans should focus.”

“Starting Small” with Machine Learning

The beauty of machine learning, compared to routine statistical prediction, for example, is in its ability to let users experiment. Even if “you’re not there yet” because you still struggle with data and basic analytics, you can start small and explore with ML.

For example, consider an analysis we ran using a publicly-available dataset (used in previous datamining competitions) from a Wisconsin breast cancer study. It’s a small dataset from 699 patients, and we ran several algorithms to test diagnostic accuracy (benign or malignant) based on nine tumor characteristics. The goal was to find an ML algorithm with greater than 95% predictive accuracy. 

What tool did we use? The TIBCO Enterprise Runtime R (TERR) engine embedded within TIBCO Spotfire. Beyond its visualization and dashboarding capabilities, TIBCO Spotfire offers advanced analytics functionality for machine learning applications. TERR lets you work in Rstudio to develop, test, break, and iterate with algorithms – but it makes R run 10- to 100-times faster than open source R.

We tested seven different algorithms. Our results showed six of the seven machine learning algorithms yielded 95+% accuracy. While a small sample set, it’s enough to demonstrate that sometimes a simple model with high predictive accuracy is good enough. 

If your next steps include trying ML on your TIBCO Spotfire and TERR installation, here are three tips and one caveat to get you started:

  1. 1. Follow a process. Machine Learning is about process to determine which algorithm to use, saving you from countless other decisions to make with a more manual evaluation of available data.
  2. 2. K-fold cross-validation is a best practice for training algorithms. Maintain 80/20 data split for final predictions, to have a hold-out sample to test the final algorithm.
  3. 3. Try ensembles – or combinations of algorithms to see if they deliver the accuracy and insights sought.
  4. 4. The caveat: Machine learning (or statistical prediction) is NOT a substitute for proper experimental design and/or sample selection.

From science fiction to real science, Machine Learning is finding new applications and new devotees who want to use its power to solve problems and answer seemingly unanswerable questions. 

“If you have Spotfire and R, you can start with small projects,” Dr. Powers says. “The technology exists, the information is there. Machine learning is more accessible than it’s ever been. So, are you ready for machine learning? I would say anyone who’s willing to learn about it is ready.”

Watch Dr. Powers’ talk at Advanced Pharma Analytics 2016 to see how you can begin incorporating machine learning to benefit your organization.


The Benefits of Life Sciences R&D Externalization – “To Go Far, Travel Together”


Externalization has been a buzzword in life sciences R&D, particularly in biopharmaceuticals, for several years now. It’s estimated that 58 percent of total R&D spending will be on external sources in 2017 - up from 33 percent in 2011.

Atrium Research defines externalization as “the pursuit of a fully or partially virtualized R&D model integrating partner companies with specific skills or capabilities.” This includes acquisitions, licensing, jointly-owned IP agreements, and more. It’s being driven by desires to lower costs, share risk, ignite innovation, increase agility, and - in general - put more (and fresher) eyes on problems.

External IP’s Outsized Impact on Late-Stage Pipeline Value

Research shows it bears fruit. Accenture reported that, over the past decade, 60 percent of innovator small molecules and 82 percent of innovator biologics were rooted outside of big pharma. 

Likewise, Deloitte discovered that external sources of innovation continue to have a significant impact on late-stage pipeline value (though not quite as much as seen from its analysis the previous year). Over half of the 12 companies studied were generating the majority of forecast late-stage pipeline revenues from external IP.

There are challenges to externalization and virtual R&D models, however. From an IT perspective, the practical challenges involve integrating the informatics ecosystem and identifying standards to make it easier and more efficient for organizations to work collaboratively.

Atrium Research outlined externalization’s impact on IT:

  • Multiple workflows are dependent on a growing number of partners
  • There are wide differences in vocabularies that often change
  • IP management practices differ among partners
  • There are different platforms and software to support
  • Materials must be distributed globally
  • There is low visibility into all potential system users

The report concludes that “R&D virtualization is the biggest challenge facing biopharmaceutical organizations over the next 10 years.”

Standardization: Tackling the Problem of R&D Virtualization

To that end, a variety of foundations, alliances, and organizations have emerged to solve some pre-competitive or non-discretionary challenges that plague the industry as a whole. They are working to create standards and methods of working, through improved collaboration within the industry. Examples include:

  • tranSMART Foundation - is building an open-source/open-data data sharing and analytics platform for translational biomedical research 
  • Allotrope Foundation - is building a “Laboratory Framework” to improve efficiency in data acquisition, archiving, and management
  • TransCelerate BioPharma - seeks to improve health by simplifying and accelerating the R&D of innovative new therapies
  • PRISME Forum - brings together information systems management executives to enhance the efficiency, effectiveness, and impact of IT on biotech and pharma R&D
  • Pistoia Alliance - brings life science companies, vendors, publishers, and academics together to lower barriers to innovation in life sciences R&D

John Wise, executive director of the Pistoia Alliance, acknowledges “we are in the age of externalization.” He has seen pharmaceutical executives shift from internal paradigms to collaborative ones, though he acknowledges some still have not. “I’m sure they are concerned about whether or not they should be following, and if they’re not, do they have good reasons not to,” he said.

There is this ambition that externalization will create more efficient R&D, will encourage innovation, and will drive real value to the biopharmaceutical industry in the release of new products to meet our medical needs more quickly,” he said. 

Many of the obstacles to innovation are shared obstacles, what Wise describes as “the cost of doing business.” These include a lack of information standards, or regulatory hurdles that affect everyone. 

One example is a unified hierarchical editing language for macromolecules (HELM). Working with Pfizer, a Pistoia project committee released its HELM solution into the open source, because it recognized that there was more to be gained from sharing the technology than a proliferation of disparate, incompatible standards. 

No one is interested in spending a lot of money creating bespoke solutions to pre-competitive, non-discretionary problems,” Wise said. “What they would like is to address those types of problems collectively and effectively, so that they can focus on their science.”

Additional common challenges that the Pistoia Alliance is focused on include building a chemical safety library, ontologies mapping and controlled substance compliance.

Biopharma Vendor Participation

Wise believes technology vendors who participate in such efforts as the Pistoia Alliance gain “a wide view of what is going on in the biopharmaceutical industry and are able to exploit that wide view for the advantage of their customers.”

He said such vendors, which includes PerkinElmer, “are not parochial and are open to wider thinking about how problems in life science R&D can be addressed through technology. That is a good thing.”

Wise typically ends his presentations about the Pistoia Alliance and benefits of participation with a slide showing a camel train crossing the desert . He tells his audience:


PerkinElmer is a participating member of the Pistoia Alliance, as well as the Allotrope and tranSMART Foundations When it comes to standards that benefit the industry, PerkinElmer wants to go far.

Michael Swartz, Vice President of Informatics Strategy & Business Development at PerkinElmer commented “We want to help our customers overcome informatics and other challenges and find it easy to do so using our solutions. Participating in these organizations keeps us abreast of the collaborative efforts and lets us contribute to the broader solutions.

Is your vendor future-focused and working collaboratively to innovate and solve industry challenges? 


Ad-Hoc Analysis - Driving High Speed Self-Service Queries


Can Data Producers and Data Consumers Find a Better Solution?

Would a business or scientific end user, a data analyst and data scientist, IT and purchasing professional, and an executive all agree what makes for the ideal informatics solution? Probably not - which is why we often end up with divergent IT stacks that support conflicting requirements.

End users want speed and agility to perform their analytics. Now isn’t soon enough, and all too often the data raises more questions than it answers, leading to new queries. Even if everyone agrees on the question, one of the biggest challenges is determining what data – including where it’s located and whether it’s been scrubbed – can answer it. For some queries, imperfect data now is better than clean data in weeks or months. 

The Rise of Frustration-Driven Spreadmarts

Frustrated, users start exporting data to flat files from reporting tools like Business Objects, then creating spreadmarts to combine data from other data sources in Excel. This is a difficult-to-repeat and error-prone process. Another difficult-to-replicate process involves creating analyses in Excel, or loading data from Excel back into analytics tools. This effort must be repeated any time the analysis needs to be updated.

Ideally, end users like the idea of self-service because it can be too frustrating to endure multiple cycles of IT requests

Data analysts and data scientists spend too much time manually finding and assembling data for end users. They and IT want to give end users access to the wealth of data generated both inside and outside the organization’s firewalls. 

Avoiding Stealth or Shadow IT

But their preference is for governed self-service, or IT-supported rapid data-request prototyping. This helps prevent stealth or shadow IT efforts that can create havoc within organizations. When individuals find their own ways to provision data for analysis, it becomes ungovernable…not to mention non-repeatable across the enterprise, and open to error or security risks. 

C-suite and purchasing executives want to control data costs without hamstringing operations. And, with Data management and data analytics becoming central to business strategy, executives are looking at data as the holy grail of insights that lead to better decisions that drive profits, innovation, and business advantage.

While IT doesn’t need total control, they do need visibility into how data analysis is being performed and what tools are being used. How, then, to get to single source of truth (SSOT) systems that everyone can rely on? 

A Platform for Ad-Hoc Analytics: Bringing it All Together

One way to support the needs of various data producers and data consumers in an organization is through an IT-supported platform for ad-hoc analytics. This provides governed self-service access to the wide array of available data while also allowing IT to rapidly operationalize new data requests into the governed IT stack.

Note: self-service can mean two things. It can mean end users have direct access to the data, or that IT has a better understanding of data across the organization and is better able to quickly prototype new data requests. Realistically, it’s a blend of the two

With an IT-managed platform that lets users provision their own data, the demand on IT from circular ad-hoc queries is greatly reduced. Yet, the platform still gives IT visibility into the queries and searches that are being created, which they and data scientists can then use to rapidly prototype new applications to integrate into the governed SSOT system (often a data warehouse). 

This approach meets the needs of data producers and consumers alike, adding speed and agility to the process while protecting organizational data and the system overall with a single version of the truth. 


Protecting Data While Simultaneously Enabling Self-Service: Can It Be Done?

To build such a platform requires the creation of a semantic layer to catalog all information within an organization – as well as public data sources relevant to its users. This helps users understand the data available to them, from which they can seamlessly identify and unify the relevant data for their analysis. 

Self-Service Data Discovery: The ‘eCommerce’ Approach

The ideal solution becomes a self-service discovery portal that delivers an ecommerce-like shopping experience, where users search for data of interest (both known and unknown to them). The solution would also assist in identifying and unifying the best, most relevant data sources, regardless of structure, for the analysis at hand.

Finding data quickly is more like visiting an online library, some have proposed, where a data librarian provides the structure for users to find the materials they need - when they need it - for their research. Instead of being perceived as the gatekeeper, IT, data analysts, and data scientists - using a well-managed platform - transform into beacons, shining a light on the pathways to information and insight.  

Holistic Opportunities

There are opportunities for more holistic data initiatives. The best solutions provide rapid, self-service or guided access to semi-governed data. They move ad-hoc queries to an IT-owned platform for traceability and repeatability. And they enable rapid prototyping of new requests and operationalize them in the governed IT stack.

To deploy the right platform to support your ad-hoc data analytics, make sure yours:

  • supports an ecommerce-like experience for accessing data
  • recommends data relevant to the search
  • correlates all structured, semi-structured, and unstructured data
  • enriches the data with scientifically relevant dictionaries and ontologies
  • simplifies provisioning data to business intelligence tools, like PerkinElmer TIBCO® Spotfire

Are you meeting the needs of both your data producers and data consumers

Learn more at PerkinElmer.


Risk-Based Monitoring: Increasing Efficiency in Cancer Clinical Trials

RBM Contributions to Oncology Research

It’s Breast Cancer Awareness Month and - in the U.S. - everything from landmark buildings to NFL players are wrapped in pink, to raise both awareness and funding for research.

With the American Cancer Society estimating 1.68 million new cases of all cancers this year in the U.S. alone, it’s no wonder the federal government initiated the Cancer Moonshot 2020. That $1 billion effort seeks to “win the war on cancer” by unleashing the potential of combination immunotherapies to treat cancer patients.

The National Cancer Institute reports U.S. investment of $5.21 billion for fiscal year 2016, which supports investigator-initiated research, clinical trials, and other initiatives. But, NCI notes its funding has plateaued over the past decade, as research costs have increased and inflation eats away at its buying power.

Despite best efforts to raise sufficient funds to fight cancer, researchers know they must continually find ways to work more efficiently. Given that, for several years now, the FDA has encouraged the use of risk-based monitoring (RBM) in clinical trials

Risk-Based Monitoring: Determining What Works Best

RBM has been described as a methodology using risk algorithms to determine the right level of monitoring. The FDA defines monitoring in general as “a quality control tool for determining whether study activities are being carried out as planned, so that deficiencies can be identified and corrected.” It covers a wide range of activities, but according to Applied Clinical Trials, RBM is meant to reduce the time-consuming, costly practice of onsite 100 percent source data verification (SDV), while refocusing efforts on improving data quality.   

“There is a growing consensus that risk-based approaches to monitoring, focused on risks to the most critical data elements and processes necessary to achieve study objectives, are more likely than routine visits to all clinical sites and 100% data verification, to ensure subject protection and overall study quality,” the FDA asserts.

The FDA is not alone. ICH, EMA, the UK’s Medicines and Healthcare Products Regulatory Agency, the Clinical Trials Transformation Initiative and others have endorsed RBM as a means to improve clinical trial monitoring.

And, clinical trial monitoring needs improving: Studies have shown that trials are taking longer (up to 16 months more, on average), and that it’s harder to recruit and retain patients. 

In oncology trials in particular, access to patients is a major challenge. There is significant competition among academic and industry needs, and trial design is more complex since trials do not involve healthy volunteers. Only oncology patients on standard-of-care treatments may participate. 

Is RBM the solution?

Does the promise of RBM hold up when practitioners evaluate their efforts? A recent analysis of the literature on RBM finds it is working: “Reduced SDV combined with a centralized, risk-based approach may be the ideal solution to reduce monitoring costs while improving essential data quality.”

In addition, when Cancer Research UK (CRUK) piloted RBM at its Center for Drug Development, it saw 20 percent efficiency savings in monitoring of early phase oncology trials, and other measures indicated support for RBM from monitoring site staff. 

More organizations are moving to implement RBM. Specifically, it’s being used in more oncology trials because of reduced source data verification, support from regulatory agencies, and improved integration with CRO activities

One survey reported nearly 50% of respondents were using RBM across programs or in pilots, up from a benchmark of 33 percent. Another 10-30% indicated they would start RBM strategies within the coming 12 months. The growing use of electronic solutions and statistical assessments was cited.

RBM helps clinical trials in significant ways:

  • Site selection: by tracking site feasibility and performance over time, in a specific therapeutic area such as oncology.
  • Subject recruitment: by tracking issues based on a holistic view of data - for example, site readiness/training or geographic challenges, and bringing enrollment back on target.
  • Trial oversight and subject safety: by integrating in-flight clinical outcomes data with operational data, tracking site performance is possible. This helps increase the efficiency of trials while protecting the safety of participants. Trends and outliers in safety data are identified faster, preventing costly and dangerous delays in the analysis of potential safety concerns.
  • Integration with real-world evidence (RWE): addresses the trend toward personalized medicine in oncology research. This targets therapies to specific cohorts of subjects and identifies these subjects for research using data beyond that from controlled clinical trials, such as electronic health record and social media data.

FDA: Technology to Reduce Need for Onsite Monitoring

While FDA says it expects some amount of onsite monitoring to continue, “evolving monitoring methods and technological capabilities” will lead to its decrease.

Technology solutions designed for RBM can reduce the effort spent on complete SDV, while providing better data analysis and protection of trial subjects. This shifts clinical monitoring from frequency-based to risk-driven monitoring that is proactive and comprehensive.

Some of the advantages of using an RBM technology solution include:

  • Speed to Actionable Insight – ensures clinical development users can act more quickly to answer risk monitoring questions.
  • Visibility into the Unknown – reveals trends and patterns that present risk (or opportunity) through visualizations, dashboards, and applications. Disparate clinical, operational, and safety data are connected.
  • Self-Service Discovery – allows users to drastically reduce time-consuming reliance on IT for data preparation, report building, and spreadsheet version control.
  • Universal Adaptability – empowers the broadest spectrum of clinical development users.

Trial monitors interested in RBM solutions that achieve the above-stated benefits should look for the following:

  • Closed-loop, actionable RBM that lets you follow up on recommended actions that have been triggered by risk signals directly within the RBM system
  • Adaptive, flexible risk model that lets you assess the efficacy of your RBM strategy within the RBM system, and also adjust the risk model accordingly. Examples include: thresholds, weightings, algorithms, or recommended actions.
  • Holistic data view that provides insights from data from multiple sources, all at once, even if the data is not standardized.
  • Persona-driven workflow that lets you get to the data needed quickly - based on your persona in the system - and act on it. This eliminates the need to hunt for the data you require.

Are you using RBM to improve efficiency of your oncology clinical trials? Find out how PerkinElmer Informatics can deploy RBM and Trial Operations solutions to make the most of your clinical trials.

Risk-Based Monitoring: Separating the Risk from the Noise

Clinical development professionals are tasked with making sure every trial site runs efficiently, follows protocol and generates the highest quality data. With increasing clinical trial length and cost, Risk-Based Monitoring (RBM) is growing exponentially amongst sponsors and CROs.

In fact, regulatory agencies are now strongly recommending a risk-based approach to monitoring – encouraging sponsors & CROs to focus resources on sites that need the most monitoring. The FDA has issued guidance (Guidance for Industry Oversight of Clinical Investigations — A Risk-Based Approach to Monitoring) with the stated goal: 

“…to enhance human subject protection and the quality of clinical trial data by focusing sponsor oversight on the most important aspects of study conduct and reporting… The guidance describes strategies for monitoring activities that reflect a modern, risk-based approach that focuses on critical study parameters and relies on a combination of monitoring activities to oversee a study effectively. For example, the guidance specifically encourages greater use of centralized monitoring methods where appropriate.”




In this video, we share how PerkinElmer Informatics helps companies implement a Risk-Based Monitoring approach to clinical trial development. 

Historically, clinical trial monitoring has always depended on 100% source document verification (SDV) and other on-site monitoring functions to ensure patient safety and data integrity – requiring up to 30% of the $2.6 billion it takes to bring a new drug to market.

This costly practice has been found to have almost no impact on data quality or patient safety – spurning regulatory agencies to encourage solutions such as RBM platforms that integrate data from different trials, at different locations and in different formats. In fact, RBM proponents believe the approach will return an overall reduction in trial clinical costs of up to 20% – making it equally attractive to both sponsors and clinical site managers. 

Watch the video to learn more about clinical trial management and implementing an RBM approach. 

Why the Cloud is the Clear Choice

There’s nothing cloudy about it – the Cloud offers a number of advantages over on-premises computing solutions.

As life sciences researchers work with higher volumes and more variety & complexity of data - and as the organizations they work for become more sensitive to the costs of dedicated IT resources - cloud computing emerges as a solution for a multitude of challenges.

Not surprisingly, Amazon Web Services, a provider of cloud computing services (which - full disclosure - PerkinElmer uses) identifies six main benefits of cloud over traditional computing:

  • Trade capital expenses for variable ones: rather than invest in on-premise data centers and servers, cloud computing lets you “pay as you go,” more like a utility. You only pay for what you use, avoiding steep upfront capital costs.
  • Gain massive economies of scale: sharing services with hundreds of thousands of cloud users lowers your variable cost.
  • Eliminate guessing at capacity: on-premise requires precise forecasting, but too often you end up with either idle servers or overloaded ones. Cloud enables perfectly scalable, just-what’s-needed service - without the guessing.
  • Increase speed and agility: the cloud lets you release new IT resources and updates in a click, for a “dramatic increase in agility.”
  • Don’t pay to run and maintain data centers: spend instead on growing your business (or expanding your research), not “racking, stacking, and powering servers.”
  • Go global in minutes: deploy applications in regional clouds worldwide to lower latency and enhance customer experience.

These are key advantages for any life sciences organization, and Amazon is not alone in its analysis. The Yankee Group  found that, on average, the total cost of cloud-based Software-as-a-Service (SasS) offerings is 77 percent lower than systems on-premises. 

Comparing the Cloud With Physical Infrastructure 

Sure, you’ll pay more in subscription fees for cloud services versus licensing fees for software, but the costs for customization and implementation of licensed software, the hardware to run it, the IT personnel to manage and maintain it, and the time an expense of training add up to a total cost that far exceeds cloud computing.

Our own analysis, comparing PerkinElmer Signals for Translational  to a popular open-source system based on a traditional software model shows total cost of the cloud model to be nearly half that of the on-premises one.


 Advantages of the Cloud: More than Cost-Savings 

On-Premises ModelCloud-Based Model
Requires new HW, SW, & IT supportNo new HW, SW or IT involvement
Implement in months/yearsImplement in days/weeks
Time consuming, costly upgradesAutomatic, disruption-free upgrades
No innovation velocityConstant improvement, enhancement
Designed for techie usersDesigned for business users
Time-consuming to rolloutAutomatically accessible worldwide
High upfront cost and TCOLow annual subscriptions and TCO
High risk: big upfront purchaseLow risk: try first, cancel at any time

This chart shows several additional advantages of the cloud over the on-premise model. With the cloud, plan on scaling up more quickly and deploying complex algorithms more frequently. Accelerated and automated updates are less disruptive to your users, keeping them focused on answering the questions their research continually poses.

The cloud has also enabled a new way of working – distributed research and development. Whereas the old centralized mainframes both consolidated and isolated data, the cloud fosters collaboration with external partners and opens access to emerging markets and public data.  

Cloud-Based Computing is Working

For proof the cloud is delivering on its promises, look no further than its use. Cisco, in its Global Cloud Index: Forecast and Methodology, 2014-2019 report, predicts that 86 percent of workloads will be processed using cloud data centers - versus only 14 percent by traditional data centers. Cloud data centers handled a workload density of 5.1 in 2014, but that will increase to 8.4 by 2019 – compared to just 3.2 for traditional data centers.

There is one caveat which we’ve addressed in this blog before – the issue of security in the cloud. While it is our opinion that there are solid technology solutions for security concerns, some in the industry are addressing this for now by using a hybrid of both private cloud and on-premise computing for mission-critical workloads. 

Due to the “strengthening of public cloud security” however, Cisco predicts that public cloud will grow faster (44 % CAGR  2014-2019) than private clouds (16% CAGR for the same period). They estimate more workloads in the public cloud (at 56 percent) than in private clouds (at 44 percent) by 2018.


Leveraging Big Data in Academic Research


This blog and others have celebrated Big Data for the big insights it can provide – and has already brought – in applications as diverse as finance, marketing, medicine and urban planning. Perhaps the slowest — or most cautious — in big data application has been academia - specifically at academic research institutions whose work contributes to pharma, biotech, drug discovery, genomics, and more.

Some of this is attributed to the cost for academic institutions to acquire software and systems to analyze big data, as well as the complexity of integrating and keeping up with multiple, evolving solutions. One survey found that access to data sets was the No. 1 problem for academic institutions interested in applying big data to research. 

Open Access to Big Data?

Late last year, four leading science organizations called for global accord on a set of guiding principles on open access to big data, warning that limitations would risk progress in a number of areas - from advanced health research to environmental protection and the development of smart cities. The “Open Data in a Big Data World” accord seeks to leverage the data revolution and its scientific potential, advocating that open data would maximize creativity, maintain rigor, and ensure that “knowledge is a global public good rather than just a private good.”

Collaboration around Precision Medicine

Big data is making a bigger impact as genomic centers, hospitals, and other organizations – particularly cancer centers – collaborate around precision medicine. These efforts use genomic data to make treatment decisions even more personalized. Precision medicine both generates and depends on Big Data – it is a source for much of the volume, variety, and velocity of data, and requires analysis to turn it into useful insights for drug discovery, translational research, and therapeutic treatments. 

The Need for Searchable Databases

While the federal investment of $215 billion in the Precision Medicine Initiative is certainly welcome, scientists argue that additional funds and collaborations are needed. One such collaboration, the New York City Clinical Data Research Network (NYC-CDRN), has created “a robust infrastructure where data from more than six million patients is aggregated.” Dr. Mark Rubin of Weill Cornell Medical College argues that searchable databases like the NYC-CDRN are needed so that researchers and clinicians can “better design studies and clinical trials, create personalized treatment plans and inform medical decisions” for optimal, data-informed care.

It appears the marketplace is listening.

Yahoo! announced it was releasing “a massive machine learning dataset” comprised of the search habits of 20 million anonymous users. The dataset is available to academic institutions and researchers only, for context-aware learning, large-scale learning algorithms, user-behavior modeling, and content enrichment. Both Facebook and Google have similarly opened access to artificial intelligence server designs and machine learning libraries, respectively, for academic as well as industry use.

In the face of another federal initiative, the Cancer Moonshot, the Oncology Precision Network has decided to share aggregated cancer genomics data as it aims to “find breakthroughs in cancer care by leveraging previously  untapped real-world cancer genomics data while preserving privacy, security, and data rights.” The Cancer Moonshot itself integrates academia, research, pharmaceutical, insurance, physician, technology, and government organizations to find new immunotherapy-based solutions for cancer care by 2020.

These and other efforts like them let academic researchers cover new ground now that they have access to previously unavailable datasets. In addition, nonprofit and academic institutions are analyzing data from previously conducted research, linking databases, and leveraging machine learning to further accelerate precision medicine development.

A Big Data Guinness World Record

A Guinness World Record ought to be proof of such an impact. The title for “fastest genetic diagnosis” went to Dr. Stephen Kingsmore of Rady Children’s Hospital-San Diego. He used big data and new technologies to successfully diagnose critically ill newborns in 26 hours, nearly halving the previous record of 50 hours. 

This is the kind of breakthrough that researchers – both in industry and academia – are hoping big data can deliver more of. 

Academic Tech Lags Behind Industry, but Big Data Solutions Could Help Close the Gap

But it means overcoming challenges that keep big data technologies out of the hands of academics. Post-docs and grad students want to find at their universities the technologies they will also use in their careers. Academic institutions want to keep costs low while protecting intellectual property. They want access to that IP, even as students graduate, so that future students can benefit from work that has already been done. 

Researchers want to analyze data within context and supported by big data findings. Systems and solutions that let them combine efforts with industry will go a long way toward increasing the discovery and development of highly accurate clinical diagnostic tests and treatments.

What is holding your academic research institution back from deploying big data technologies? Do you need a plan for implementing big data in your organization? Download our Free Trials 


Improving the Odds

What do Las Vegas and Drug Discovery & Development Have in Common?

It’s been said that “the odds of getting a drug to market aren’t much better than winning in Las Vegas. Only one in 5,000 to 10,000 compounds discovered in the lab gains FDA approval.” 

The story has been that - while Pharma R&D activity and spending was increasing - the number of approvals for new molecular entities (NMEs) was decreasing. After approving 86 NMEs in 1999-2001, less than a decade later the FDA approved only 77 in a three-year period. In the same time period, global R&D spending by the top 500 pharma companies jumped from $59 billion to $131.7 billion. [source]

This phenomenon earned its own name – Eroom’s Law. That’s Moore’s Law spelled backwards because - unlike computer processing power doubling every two years - pharma R&D productivity (measured by FDA new drug approvals per inflation-adjusted billion dollars spent) has halved roughly every nine years since 1950. 

Causes of Declining Pharma Productivity

There were four main causes leading to the decrease in drug research & development productivity:

  1. “Better than the Beatles” Problem: we compete against our greatest hits and any new drug needs to be better than the blockbuster (especially if the blockbuster is now available as a low-cost generic).
  2. “Cautious Regulator” Problem: a progressive lowering of risk tolerance raises the bar on safety for new drugs.
  3. “Throw Money at It” Tendency: we hope something sticks, which often leads to waste.
  4. “Basic Research vs. Brute Force” Bias: we overestimate the probability that newer “brute force” efforts (think large-scale screening processes) will show a molecule safe and effective in clinical trials.

With $2.56 billion being the latest estimated cost for developing a single prescription drug (inclusive of failures and capital), everyone is looking to overcome Eroom’s Law – without feeling like it’s just a roll of the dice.

Is Pharma R&D on a Winning Streak?

The good news is the FDA reported a bit of a winning streak, with 41 and 45 approvals in 2014 and 2015, respectively, for both NMEs and BLAs (new Biologics License Application), compared to an average of 25 approvals in the preceding eight years.

Lest we fall prey to Gambler’s conceit, it’s worth taking a look at what might be changing. And making sure we’re placing the right bets.


The Value of Big Data

Importantly, there is recognition that Big Data plays a role in accelerating successful drug discovery and development. Alongside those “brute force” efforts to do more are Big Data efforts to extract more meaning and insights. 

Consider the growing volume, velocity, and variety of life science data: 

  • In genomics, there are 3 billion bases per human genome; 25,000 genes and millions of variants; and up to 1 terabyte of data per sample.
  • We’re imaging millions of compounds using functional screening, with thousands of cells per well and billions of measurements per run.
  • Outcomes are measured from 200,000 registered clinical trials; from millions of patients, doctor visits, and samples; with both structured and unstructured data. We’re shifting towards complete genomic analysis of patients.

Increasingly, the data being analyzed is not exclusively proprietary or new. Therefore, those who are fastest to glean valuable insights are better positioned to win in the marketplace. And while it may be difficult to create another “Sgt. Pepper’s Lonely Hearts Club Band” or control what regulators do, there is opportunity to use Big Data to science’s advantage – for personalized medicine, translational research, and, in general, faster insight to action. Data and analytics are at the heart of envisioned improvements in healthcare.

Realizing these improvements has required new tools and solutions to turn Big Data into Big Insights. Particularly in the life sciences, these solutions must address the informatics challenges of data variety, complexity, volume, and the need for more collaboration, more flexible data infrastructure, and less data isolation.


Betting on Pharma R&D

The sure bet for fixing what ails pharma R&D, and to ensure the NME and BLA approvals keep rising while lowering total cost, involves better scientific informatics – for collaboration, data analysis and visualization, data integration, and scientific smarts.

When data is unified, visualized, contextualized, and operationalized, it unlocks critical insight

What are the odds your scientists are empowered by informatics solutions to make better decisions from data? Don’t just roll the dice on turning Big Data into Big Insights. Find out how PerkinElmer Informatics is helping to reverse Eroom’s Law.


100-days of #Spotfire®


Since 1986, PerkinElmer Informatics has been supporting researchers across industry and academia, with market-leading intelligently designed software solutions. Today we are thrilled to announce the start of the 100-days of Spotfire® - celebrating one of the most powerful tools in the PerkinElmer Informatics catalog, TIBCO™ Spotfire®.

Big Data is only Getting Bigger…

As a scientist, you know what it’s like to struggle with vast amounts of complex data from a wide array of sources. As data outputs increase, so does the pressure to find the needle in the “data” stack that leads to your next big discovery. Our pursuits for data-driven insights & agile decision-making has never been greater. 

Spotfire® provides dynamic analysis and visualization tools that will tackle the most cumbersome data sets with just a few clicks of the mouse. What’s more, Spotfire® not only finds answers to the questions you have, uncover the answers to questions you didn’t realize needed answers.

Start the 100-day revolution

Over the next 100 days, we challenge you to take part in the revolution of taking back your data and unlock insights that will dramatically change the way you look at your research outputs. Each day we will be showcasing a variety of posts covering topics such as:

  • How to use Spotfire® to easily analyze data from multiple sources—  including biological assay data & chemical structures and properties , cellular images, genomics & proteomics data.
  • •Simple steps to creating and customizing visualizations and dashboards. 
  • •Stories of how labs like yours apply Spotfire® to their research
  • •Time-saving tips, shortcuts and feature spotlights
  • •The available application stories around Lead Discovery® and OmicsOffice® to get more insights out of your research.
  • •Ways to easily share your Spotfire® analytics and dashboards with your colleagues

Plus: we’ll also be hosting a variety of in-person and online events and be offering flash promotions and discounts throughout the next 100 days.  Stay tuned by following us here on LinkedIn, across our LinkedIn Groups, Facebook and Twitter 


CROs and Big Data Analytics

Contract Pharma Using Informatics to Gain Competitive Edge

It’s no secret that clinical trials have grown more complex, and that pharmaceutical companies are increasingly turning to contract research organizations (CROs) for support. In fact, by 2020, CROs are projected to handle nearly three quarters of all clinical trials. Research and Market’s New Trends in Global Clinical Development Outsourcing forecasts annual compounded growth of 9%, from $38.4 billion in 2015 to $64 billion by the end of this decade.

A 2015 Nice Insight survey also clearly demonstrates spending is escalating with CROs: The number of sponsor companies spending $10-$50 million on outsourcing reached 62% in 2014-2015 - up 24% over the previous year, and double that of 2011-2012.

Among the market drivers for seeking outside support is the need to apply big data analytics to the numerous - and growing - forms of multi-structured data being incorporated into clinical trials. These include:

  • -  Electronic health/medical records
  • -  Clinical data repositories
  • -  Prescription information
  • -  Real-world outcomes
  • -  PK/PD data
  • -  Wearable device data
  • -  Publicly available data repositories

 Pharma and life sciences companies are struggling to keep up with data volumes, data integration, long-running analyses, and data visualization.                                                                                                                                                       

Leveraging various data sources – to yield a fuller, clearer picture for decision making – requires advanced informatics solutions  that can integrate multivariate data and serve up informative visualizations and analysis. Such technological capabilities deliver the promise of big data to sustain drug pipelines.

Forward-looking CROs

Whether large, global entities or smaller, niche players, the best-in-breed CROs are focused on building out their infrastructures to effectively capture, consolidate, standardize, and visualize both operational and clinical data from multiple sources. Such real-time predictive analytics generates the business value which sponsors are looking for in their CRO partners to drive better clinical decisions.

Both CMOs and CROs continue to expand their service offerings, and are becoming technology-platform providers. Offering visual, interactive dashboards to allow for rapid decision-making and deeper examination of the data, these big data and analytics technologies become a value-add service for CROs. 

McKinsey confirms this added value. It estimates that applying big data strategies can generate up to $100 billion in value annually across the U.S. healthcare system by optimizing innovation and improving the efficiency of research and clinical trials.

Informatics a Key Tool for Pharma-CRO Collaboration

Outsourcing creates a partnership between the pharmaceutical sponsor and CRO, and success requires tools that help the two collaborate effectively. Comprehensive informatics solutions can bridge sponsor and CRO, breaking down barriers and identifying the top data sources (EDC, CTMS, PV, labs, etc.) that are essential to analysis. CROs, therefore, must invest in best-of-breed platforms and technologies in order to deliver optimum data-driven decision-making capabilities to their pharma sponsors.

This includes clinical informatics to improve patient enrollment and retention, clinical data and operations review, and risk-based monitoring. Better integration of data across the development lifecycle delivers a complete pipeline picture – in real time.

While some studies indicate CROs in emerging countries are stealing market share, those in developed countries still command nearly 67 percent of the global market. This means CROs need the technology to deliver results around the globe in near-real time, without any latency issues  

In a high-growth, hyper-competitive CRO market, informatics solutions can distinguish the leaders from everyone else.

Are you working with the best informatics platform to take advantage of big data?

Learn more about how PerkinElmer clinical informatics, powered by TIBCO Spotfire®, the leading data analytics and visualization platform, can help you leverage everything big data has to offer.