Rapid, Incisive Data Analysis for Lead Discovery

In a previous post we commiserated with drug discovery scientists and their IT colleagues in their daily struggles to deal with the ever-increasing research data deluge. We then attempted to ease their pain by exploring modern informatics tools and applications that guide them to rapidly and intelligently identify, locate, search, extract and organize tractable sets of relevant data (internal or external, structured or unstructured, small molecule or biologic) for detailed analysis and visualization. 

This current post moves to the improvements that are needed in the next stage in the drug discovery/lead identification/lead optimization workflow, where scientists will want to analyze data sets to derive insights and to answer pressing scientific questions and make research decisions such as: 

Identifying the most promising chemical scaffold and substituent set with oncologic activity

Exploring this set further to improve candidate compounds’ DMPK and toxicity profiles

Drop compounds which have been explored to the point that no further progress seems achievable. Things aren’t all bad, surely, as researchers can choose from an extensive (some might even say overwhelmingly scary)  array of analysis and visualization tools to wield at their data set. Such tools and applications  will often have their own idiosyncratic user interfaces and steep learning curves, and it can be difficult for scientists to become experts in all the tools that they might want to use, or even to know which are the most appropriate tools, and the order in which to use them for optimum effect. 

An ideal informatics platform should be well aware of these impediments to rapid and incisive data analysis and advanced structure-activity relationship (SAR) visualization. Instead of daunting the researcher with a potentially overwhelming menu of tools and applications, an effective system should offer a guided pathway embodying scientifically meaningful best practices to select and apply the most appropriate tools and techniques to the active data set. 

Faced with a potentially daunting set of chemical structures, bioassay results, physicochemical properties and DMPK/tox profiles, researchers will want simple access to a rich set of viewing options that can help present the data in an intuitive way, including gallery views, forms, combined chemical compound and sequence views, and 3D chemical structure overlay. These advanced visualizations should also be bolstered by a rich set of dynamic charting options, and robust statistical analyses. 

Data sets can be enriched and explored with more precision if the system can calculate additional physicochemical properties, and then filter and cluster the data based on observed and calculated parameters and structural descriptors to help in identifying promising lead series. Researchers also need to be able to explore and display data at multiple levels within a data hierarchy, from plate to compound level, and compound to chemical series or therapeutic project level for ease of navigation. 

Creating a SAR table should be simple and easy with point-and-click configuration. The resultant  SAR tables should be amenable to exploration in depth with a powerful set of chemical structural and biosequence analysis and visualization tools, including ultra-fast search and R-group analysis of chemical series, and biosequence search and alignment. These advanced chemistry and biologics SAR tools and workflows will be the subject of subsequent blog posts in this series. 

Modern systems should appeal both to power users, who will want unfettered access to advanced tools, and also to occasional users, who will want to extract immediate value with a minimal learning curve. A productive system should provide ready-to-use templates so that occasional users can immediately start to confidently explore their data, and as groups establish and refine their analysis workflows, these can be captured in shared templates for quick and consistent analyses between collaborating research groups. Adding an auto-update capability, so that saved queries and filters are automatically re-run as the underlying data is updated, will save time and increase productivity. 

The end result of using such an ideal, modern informatics system is more rapid and incisive SAR analyses and increased researcher productivity, as they focus on science rather than learning new user interfaces, and faster insights and better informed scientific decision making. If you are a medicinal chemist, assay biologist, or ADMET analytical chemist, you can  leverage self guided analytics and visualizations for faster lead discovery. Watch this webinar for a quick demoDiscover and automatically update project data.

Configure project-specific SAR tables and annotate for compounds of interest.

Perform compound series analysis to identify R-groups of interest.

Perform sequence analysis to identify amino acid changes of interest.

Find Clinical Candidates Faster: Watch webinar now.