Faster insights and better science in the search for desperately needed new therapies

In a previous post  we noted the increasing importance of biologics as therapeutic agents, with 37% of the drugs approved by the FDA in 2017 being biologic entities. A recent article in Chemical & Engineering News (June 4, 2018, pp 28-33) focused on activities in immuno-oncology, where biologic checkpoint inhibitors are being tested in combination with other immunotherapies: there are currently ca. 250 small molecule- and antibody-based immunotherapies in clinical studies, and > 1100 clinical trials in 2017 combined a checkpoint inhibitor with another treatment. 

With this increasingly urgent drive to discover and develop novel bio therapeutics in areas such as oncology, it is crucial that researchers are equipped with the best possible tools to capture, manage and exploit all the available data, and we commented that “In the area SAR and bioSAR, underlying chemical structural and bio-sequence intelligence are key requirements for meaningful exploration and analysis, and these are often only available in separate and distinct applications with different user interfaces, when ideally they should be accessible through a unified chemistry/bio-sequence search and display application, supported by a full range of substructure and sequence analysis and display tools.”

In this post we drill into these requirements in more detail and discuss how an ideal bioSAR tool should support faster insights and better science in the search for desperately needed new therapies. 

As we noted previously, researchers are struggling with a data deluge, and need effective tools to locate, extract, sift and filter relevant data for further detailed visualization and analysis. With biologics, these applications will need to understand and manage bio-sequences, and an immediate requirement will be to allow sequence searching, using a standard tool such as BLAST to search across internal and external sequence collections, to collect and import the appropriate hits in a standard format, and to link them to other pertinent properties (bioactivity, toxicity, physicochemical, DMPK, production, etc.)

With a tractable data set on hand, researchers will want to explore sequences to try to discern particular motifs or sequence differences that are correlated with bioactivity or desired physicochemical or DMPK profiles, and thus potentially amenable to further manipulation and enhancement. 

The sequences must be aligned, for example by CLUSTAL Omega, and visualizations should present sequences so that sequence differences are immediately highlighted, and monomer substitutions can be explored for potential links to bio-therapeutic activity. LOGO plots to investigate the distribution of monomers in a set of sequences, and annotations to highlight and share areas of interest will also help researchers to get to insights more quickly. 

If scientists want a deeper dive into the underlying structure of the sequence or a region, immediate access to a detailed and interactive 3D rendering of the biomolecule’s structure can provide a different lens through which to understand how different monomers substitutions may impact protein folding or active site binding and thus activity.

There may also be cases where required specialized analysis or visualization capabilities are only available in a separate in-house developed, third party, or open-source application, and the provision of an extensible Web Services framework will enable these to be quickly linked in to an enhanced analysis pipeline that can then be shared with colleagues and collaborators.

A bioSAR system providing the capabilities discussed above, equipped with an intuitive and unified user interface catering for novice and power users alike will enable them to derive faster incisive insights and make better informed scientific decisions in the search for novel bio therapeutic agents targeting some of the world’s most pressing unmet clinical needs.

Accelerate analysis of sequence differences relative to a reference sequence.


SAR Trek

In a previous Blog post we highlighted the day-to-day informatics problems facing IT/IS staff and researchers in biopharma companies as they struggle to discover and develop better drugs faster and more cheaply. Key among these was the challenge of dealing with the data deluge – more complex data at greater volumes, in multiple formats and often stored in disparate internal and external data silos. As Jerry Karabelas, former head of R&D at Novartis quipped, in an updated and repurposed phrase from Coleridge “Data, data everywhere, and not a drug I think.” And that was at the turn of the 21st century, and things have only got worse since then, with the term “big data” now getting over four million hits in Google. 

Typical therapeutic research projects continually generate and amass data – chemical structures; sequences; formulations; primary, secondary, and high-content assay results; observed and predicted physicochemical properties; DMPK study results; instrument data; sample genealogy and provenance; progress reports, etc. – and researchers are then charged with the responsibility of making sense of all the data, to advance and explore hypotheses, deduce insights, and decide which compounds and entities to pursue, which formulations or growth conditions to optimize, and which to drop or shelve. 

A usual first step will be to collect together all the relevant data and get it into a form that is amenable to further searching and refinement: but this poses a potentially challenging set of questions – what data exists, where is it, what format is it in, and how much of it is there? Answering these questions may then be complicated if the data resides in different, possibly disconnected, potentially legacy systems: e.g. chemical structures in an aging corporate registry, sequences in a newer system, assay results in another database, DMPK values buried inside an electronic lab notebook, and instrument data in an unconnected LIMS or LES. 

So the researcher is faced with knowing:

(a) Which systems exist, where they are located, and what they contain, 

(b) How to search each of them to find the required data, 

(c) How to extract the desired information from each source in the correct usable format, and 

(d) How to meld or mash-up these various disparate data sets to generate a project corpus for further analysis and refinement. 

They will still be likely to get frustrated along the way by things like different query input paradigms (e.g. pre-designed and inflexible search forms or the need to write SQL queries), slow search response times, and either too many or too few results to generate a tractable data set. If they opt to start with an overlarge hit list, they can try to whittle the list down by tightening up their search parameters, or by locating and subtracting items with undesirable properties, but in most cases they will be faced with a slew of somewhat different hit files which then need to be sensibly merged through a sequence of cumbersome list logic operations (e.g. intersect the pyrrolidine substructure search compounds with the bioassay IC50 < 0.5 nanomolar hits, and then see if any of those match the required physicochemical property and DMPK profiles in the third and fourth files). This trial-and-error approach is inefficient, unpredictable, potentially unreliable, and time-consuming.  

Fortunately, modern systems such as PerkinElmer Signals™ Lead Discovery are now available to overcome these challenges and to equip scientists with efficient tools to rapidly locate and assemble accurate, comprehensive and workable data sets for detailed and scientifically intelligent refinement and analysis. Prerequisites include a future-proof, flexible, and extensible underlying informatics infrastructure and platform that can intelligently and flexibly handle and stage all types of R&D data, now and in the future, data or text, structured or unstructured, internal or external.   Establishing an informatics platform like this and making all the relevant data instantly accessible to researchers removes the data wrangling challenges (a – d) discussed above and delivers immediate productivity and outcomes benefits as researchers are free to focus on science rather than software. 

Rather than struggling to remember where data is located, and how to search it, scientists can now be intelligently guided. Signals Lead Discovery lists and clearly presents the relevant available data (including internal and external sources) and offers simple and consistent yet flexible searching paradigms to query the underlying content. Modern indexing techniques (including blazing fast, patent-pending, no-SQL chemical searching, and a full range of biological sequence searching tools) ensure rapid response times to searches with immediate feedback to see whether a query is delivering the required number of hits. Intuitive views of the data in tables and forms with advanced display capabilities built on Spotfire also give immediate visual feedback about the evolving content of a hit set as it is refined, and data drill down is always available to get a more granular view of the underlying data. 

Once the researcher has adequately shaped and refined a data set  to contain all the required relevant data, it is then immediately available for further detailed analysis and visualization, using Signals Lead Discovery’s powerful set of built-in workflows and tools, or via RESTful APIs with external and third-party tools. This downstream analysis and visualization will be the subject of future blog posts in this series. This video shows how guided search and analytics can power your SAR analysis quickly and effectively.

Light at the end of lead discovery tunnel?

Drug discovery is hard (nine out of ten drug candidates fail), time-consuming (typically 10 - 15 years), and expensive (Tufts’ 2016 estimate $2.87Bn). But things are getting better, right? In 2017, although the EMA only approved 35 new active substances,  FDA drug approvals hit a 21-year high, with 46 new molecular entities approved, the highest number since 1996. This was mix of 29 small molecules and, demonstrating their increasing therapeutic importance, 17 biologics (nine antibodies, five peptides, two enzymes, and an antibody-drug conjugate). But of the 46 approvals, the FDA only counted 33% as new classes of compound, so the others would have to be from older classes of compound, which probably entered the R&D pipeline 15 – 20 years ago. 

Is this bumper crop of 2017 new approvals some reflection of major advances in drug discovery techniques and technology that primed the R&D pipeline at the turn of the century? Or is it just an artifact of the FDA approval process and timeline? Hard to say either way, but in the long game of drug development, scientists and researchers will be keen to jump on any improvements that can be made now. 

What contributes to the tri-fold challenges that make drug discovery and development hard, time-consuming and expensive? Surely the plethora of “latest things” – personalized and translational medicine, biomarkers, the cloud, AI, NLP, CRISPR, data lakes, etc. – will lead to better drugs sooner and more cheaply? At the highest level, probably; but down in the trenches researchers and their IT and data scientist colleagues are engaged in an ever-increasing daily struggle to develop and run more complex assays, to capture and manage larger volumes of variable and disparate data, and to handle a mix of small molecule and biologic entities; then to make sense of this data deluge and draw conclusions and insights: and often to do this with inflexible and hard-to-maintain home-grown or legacy systems that can no longer keep pace.  

Let’s look at some of these challenges in more detail.

The Sneakernet

Informatics systems built on traditional RDBMS require expensive DB operators just to keep them functioning, and much time and budget has to be devoted to fixing issues and keeping up with software and system upgrades: this leaves little or no time to make enhancements or to adjust the system to incorporate a new assay or manage and index a novel and different data type. This delays IT staff making even the simplest requested change and may spur researchers to go rogue and revert to using spreadsheets and sneakernet to capture and share data. 

The Data Scientist’s inbox

Organizing and indexing the variety and volume of data and datatypes generated in modern drug discovery research is an ongoing challenge. Scientists want timely and complete access to the data, with reasonable response times to searches, and easy-to-use display forms and tables. 

Older legacy informatics systems did a reasonable job of capturing, indexing, linking and presenting basic chemistry, physical properties and bioassay structured data, but at the cost of devising, setting up, and maintaining an unwieldy array of underlying files and forms.  Extending a bioassay that captures additional data, reading in a completely new instrument data file, or linking two previously disconnected data elements all require modifications to the underlying data schema and forms, and add to the growing backlog of unaddressed enhancement tasks in the data scientist’s inbox. 

In addition to managing well-structured data, scientists increasingly want combined access to non-structured data such as text contained in comments or written reports, and legacy systems have very limited capabilities to incorporate and index such material in a usable way, so that potentially valuable information is ignored when making decisions or drawing insights.

Lack of tools for meaningful exploration

Faced with the research data deluge, scientists want to get to just the right data in the right format, and with the right tools on hand for visualization and analysis. But the challenge is to know what data exists, where, and in what format. Legacy systems often provide data catalogs to help find what is available, and offer simple, brute-force search tools, but often response times are not adequate, and hit lists contain far too few or too many results to be useful. Iterative searches may help to focus a hit set on a lead series or assay type of interest, but often the searcher is left trying to make sense of a series of slightly different hits lists by using cumbersome list logic operations to arrive at the correct intersection list that has all the specified substructure/dose response/physical property range parameters.

Once a tractable hit set is available, the researcher is then challenged to locate and use the appropriate tools to explore structure activity relationships (SARs), develop and test hypotheses, and identify promising candidates for more detailed evaluation. Such tools are often hard to find, and each may come with its own idiosyncratic user interface, with a steep and challenging learning curve. Time is also spent designing and tweaking display forms to present the data in the best way, and every change slows down decision making. Knowing which tools and forms to use, in what order, and on which sets of data can be frustrating, and lead to incomplete or misleading analyses or conclusions. 

In the area SAR and bioSAR, underlying chemical structural and biosequence intelligence are key requirements for meaningful exploration and analysis, and these are often only available in separate and distinct applications with different user interfaces, when ideally they should be accessible through a unified chemistry/biosequence search and display application, supported by a full range of substructure and sequence analysis and display tools. 

R&D Management

Lab, section, and therapeutic area managers are all challenged to help discover, develop, and deliver better drugs faster and more cheaply. They want their R&D teams to be working at peak efficiency, with the best tools available to meet current and future demands. This first requires the foundation of a future-proof, flexible, and extensible platform. Next, any system built on the platform must be able to intelligently and flexibly handle all types of R&D data, now and in the future, structured or unstructured. Research scientists can then exploit this well-managed data with tools that guide them through effective and timely search and retrieval; analysis workflows; and advanced SAR visual analytics. This will lead to better science and faster insights to action. 

Follow us on social media to be notified of the next blog in this series 


When Platforms Launch Discovery

Platform software development is trending these days, and for good reason. Platform development lets developers get to a robust foundation that addresses broad essentials, like security and reporting, but then enables dedicated teams to build innovative applications on top of that foundation to serve specific users. 

Forbes, quoting blogger Jonathan Clarks, described platforms as structures on which multiple products can be built. By trending toward platforms, the logic functions of applications can be separated out, “so that an IT structure can be built for change,” Clarks argued. He said companies “invest in platforms in the hope that future products can be developed faster and cheaper than if they built them stand-alone.” 

We’ve been convinced that a platform approach is most effective, both for our customers and for us. One of our main motivators is finding better ways to help the scientists, researchers, business analysts and others we serve quickly make sense of the overabundant data they encounter daily. The PerkinElmer Signals™ platform is enabling us to achieve this goal. 

The PerkinElmer Signals Platform

With Signals, we’re taking data integration and analysis to the next level, beyond data visualization. It consists of applications that enable deeper data insights based on context from available research data, search queries, workflows and more.

PerkinElmer Signals portfolio includes cloud-scale products as well as on premise scalable offerings that can grow with demand. Signals also leverages TIBCO Spotfire® enabled scientific workflows and data visualizations. From a basis that empowers data-driven scientific and business decisions, Signals has branched out to offer self-guided data discovery, self-guided data analysis, and visual analytics in the fields of translational, screening, medical review, and lead discovery, with more to come in 2018. We´ve even got Signals Notebook, a web-based electronic notebook for scientific research data management, and have built the Signals vision into existing products, such as E-Notebook.

This approach lets us help customers corral the explosion of data across their enterprises, while focusing on the complexity with specific solutions. The foundation uses tools for big data storage, search, semantics, and analytics to help users get a much clearer and broader view, much more quickly, from an array of disparate data. Each application builds on scientific knowledge to create scientifically accurate and relevant workflows that deliver deeper insights from the data. You get a truly modern, intuitive tool to collaborate, search, wrangle and manage data in familiar yet flexible workflows.

PerkinElmer Signals Lead Discovery

Take Signals Lead Discovery-the applications create greater data awareness for users by helping them to find data they might not even know exists. Using agile guided search and query, Signals anticipates needs and provides flexibility for on-the-fly exploration of compounds and their activity against a target.

It also brings the benefit of index-based search to Lead Discovery. For example, Signals Lead Discovery can capture search constraints and search attributes. A patent-pending algorithm lets users search chemical structures within the Apache Lucene-based indexing system, meaning it is possible, in a single query, to capture chemical-structure search constraints along with other search attributes. 

Secondly, it is suitable for both chemical and biological activity data. You can shape and annotate biological activity data into a hyper-scalable structure to perform precise quantitative searches that are seamlessly integrated with structure search. You can rapidly execute queries like “Retrieve all assay results for compounds containing a certain substructure where the activity in one of the assays is less than 15 nm.”

The lead discovery workflow, for example, enables scientists to:

discover data of interest 

immediately confirm the meaningful intersection of compounds of interest with assay results they require 

seamlessly deposit that data into a fit-for-purpose template SAR analysis 

Using search features, scientists can find their project, navigate assay hierarchies, discover how much data is available, merge substructure search results with their select project, and begin filtering for assays of interest. Or, Signals Lead Discovery can show results for compounds in range. This data is ready for SAR analysis.

No Programming

Importantly, no new programming skills are needed for working with Signals. Its composable wizardry lets users interchange components to configure the workflows they need, resulting in rapid, agile application development. When programmers don’t have to concern themselves with entirely new query syntax, or anticipate all joining operations in advance, it avoids the trouble of “schema after” that indexing systems are designs to avoid.

With a platform built on best-in-class technologies like TIBCO Spotfire®, PerkinElmer Sgnals frees organizations from being overly IT and programmer dependent. Self-service discovery and analytics, driven by powerful visualizations and easy configuration, keeps scientists focused on their science. When tools do a better job of managing and presenting the data- scientists, researchers, and business analysts gain more time for critical thinking and analysis, essential for discovery.

Are you ready for Signals? We’d love to show you around the Signals platform and the applications best suited for you. 

How Analytics Centers of Excellence Improve Service & Save Costs

Centers of Excellence: Centralizing Expertise
The “Center of Excellence” as a business model has an assortment of definitions and uses. In general, such “centers” are established to reduce time to value, often by spreading multidisciplinary knowledge, expertise, best business practices and solution delivery methods more broadly across organizations.

They have been identified as “an organizing mechanism to align People, Process, Technology, and Culture” or - for business intelligence applications - as “execution models to enable the corporate or strategic vision to create an enterprise that uses data and analytics for business value.” Still others define these centers as “a premier organization providing an exceptional product or service in an assigned sphere of expertise and within a specified field of technology, business or government…

Using a CoE to Improve Business Intelligence
In approaching how the Center of Excellence (CoE) concept might improve business intelligence (BI), analytics, and the use of data in science-based organizations, PerkinElmer Informatics has developed an Analytics Center of Excellence to deliver service for our customers.

As a framework, the CoE offers ongoing service coverage by experts from a variety of domains, including IT & architecture, statistics and advanced analytics, data integration & ETL, visualization engineering and scientific workflows. In many cases an expert is located at your facility and then leverages a wider range of remote staff, to provide support, reduce costs, and eliminate red tape and paperwork.

There are four pillars to our Analytics CoE for your organization: 

Architecture Services
Mainly for IT, this covers architecture strategy, sizing and capacity planning, security and authentication, connectivity and integration planning, and library management

Governance Services
Centralizing planning, execution and monitoring of projects, Program Management approach to managing multiple work streams, Steering Committee participation, SOPs and best practices, and change management

Value Sustainment Services
Expertise for subject matter consulting, support, hypercare, roadmap and future planning, and analytics core competency

Training & Enablement Services
Training needs assessment, training plans, courseware development, training delivery and mentoring

Cost Savings with Standardized BI Solutions
PerkinElmer’s Analytics CoE leverages TIBCO® Spotfire to help our customers get the most out of this technology as quickly as possible - from the experts. Very often - especially at mid- to large-enterprises - the question is asked, “Why aren’t we standardized on a single BI solution?”

It’s a good question.

Rather than investing time, effort, and money in evaluating, implementing, and maintaining and updating several BI solutions, not to mention training staff to use them, considerable cost savings can be gained from deploying a standard business intelligence solution across the enterprise. And the savings can be further supplemented because the Analytics CoE covers both foreseen and unforeseen needs.

Under an Analytics CoE implementation, cost savings are derived from:

  • Economy of scale from a suite of informatics services
  • Reduced administration efforts for both customer and vendor
  • “Just-in-time” project delivery that engages the right resources at the right time

Reducing the Pharma Services Budget
After converting to the Analytics CoE model, a top 25 pharmaceutical company saved 50% on its services budget, relative to TIBCO® Spotfire. This was possible because the services were bid out once - not for every service engagement. Purchasing service engagements was significantly less fragmented, and the high costs of supporting multiple tools & platforms and responding to RFPs was greatly reduced.

Standardizing on an Ongoing Service Model
Centralizing around a formal service model focuses management of the vendor relationship on a single partner – who truly becomes a partner as they manage projects across multiple domains and departments. 

The Analytics CoE model, also called competency centers or capability centers, oversees deployments, consolidation of services, dashboard setup and platform upgrades - all without the additional burden of new RFPs, vetting of new vendors, and establishing new relationships.

The benefits of standardizing on an ongoing service model, centered on a standard BI platform, include:

  • Holistic approach to deploying analytics solutions across the organization
  • Cost savings from reducing the number of tools used 
  • IT organization isn’t spread too thin as it no longer has to support multiple systems
  • Greater departmental sharing
  • Improvements beyond the distributed model

In addition, there are numerous reasons for analytical organizations to adopt an Analytics CoE:
  • Program Management managing multiple project workstreams and chairing Steering Committee meetings to provide management insight into solution delivery.
  • High quality of subject matter expertise (SME) available for your projects; SMEs are pulled in as needed and are billed against CoE.
  • Significant savings over typical daily rates – up to 50%.
  • Flexible engagement period.
  • Hourly rate fees move from the FTE model to “pay for what you use” further reduce costs.
  • Multiple projects billed against Analytics CoE.

Are you ready for true service excellence in your data-driven organization? Find out if PerkinElmer’s Analytics Center of Excellence is a good fit.

Contact us at informatics.insights@PERKINELMER.COM