Shining Light on Dark Data


For the vast majority of organizations, gaining visibility into their data is akin to shining a spotlight into a forest. You see the trunks and branches of trees and perhaps some groundcover, but have no idea what else is out there - or even how big the forest is.

That’s because - according to IT market research and advisory firm Gartner Inc. - as much as 90 percent of big data assets will be inaccessible to their organizations by next year. And perhaps as much as 80-90 percent of an organization’s data is unstructured.

Forrester, in a custom study for Attivio, found that 75 percent of unstructured data goes “unused,” and a full 59 percent of structured data goes unused as well.

What’s Causing This?

Current business intelligence (BI) and analytics delve into known and mostly structured data. But not all relevant data is in structured databases. Secondly, there are some difficulties associated with the quality, condition, and context of data. The Forrester survey found that 64 percent of data management effort and time is spent finding and profiling data sources, leaving precious little time for actually analyzing and drawing the insights and conclusions that drive intelligent business decisions.

Much of the time sink is because data is not always well organized or intuitively named. The people that want to use the data often understand the domain while the people who own the data usually understand databases. Those two groups speak different languages, making it difficult to collaboratively answer both their questions. Complicating things, relevant data is also spread across many disparate sources.

Consider: Let’s say a customer knows which databases contain relevant information, but since the data is in different databases, it is hard to combine. Conversely, IT knows there is value in such combination, but they have not had the resources to build a warehouse uniting the databases. The problem is to overcome the silos inside real resource constraints. 

Unexplored Unstructured Data

Meanwhile, vast amounts of semi-structured and unstructured data remain largely unexplored. There are several reasons for this. First, the proverbial “forest of data” is simply too dark and confusing. Who wants to go in there? For those who do venture into the data, most often the data is captured in its original form, without assigning any upfront structure. Most organizations have not invested in upfront overhead to structure more free-form content. Those who did, often find the structure to be obsolete.  Maintaining it during daily use is irregular - as the selected structure can prove cumbersome or irrelevant. Still, users often regret not putting effort into structuring content earlier, but, after coming to that conclusion feel it’s too late to do so.

Unstructured data offers value. Leading organizations find benefits from mining their unstructured content for new insights, unlocking the value that has, until now, been inaccessible. Some examples:

• Unraveling safety signals from preclinical safety documents

• Understanding the competitive landscape across multiple public sources

• Diving for hidden, associative relationships between pathways, genes, proteins, and drugs to explore drug repurposing opportunities

Get a 360 Degree View

Unstructured and semi-structured content are rich with patterns and trends and data points that - when identified and unified - can yield significant, and perhaps business-changing, insights. Without it, many decisions are being made based on incomplete data.

Structured, semi-structured, and unstructured data, from both internal and external/public databases & sources, are needed for a complete, 360-degree view to bring daylight to the full forest of available data.

The Good News for Business Analysts & Life Science Researchers

Efforts have been underway to unburden IT from responding to requests for reports and data sets for others to analyze. Instead, to speed data discovery, business analysts are using self-service tools that let them more quickly respond to their own lines of inquiry. The shift from IT-led reporting to business-led self-service has reached a tipping point, according to Gartner.

Data democratization has been a good thing. More people can get their hands on data, faster. Now they need to get their hands on more data – tapping into the full complement of their data to access the right information, with the right context for their inquiries.

Can we unlock the value of data trapped in unstructured sources?

Can we easily provision data to analytical tools without needing to know the underlying data structure?

Can we unify data from disparate sources?

Yes, yes and yes.

PerkinElmer Signals™ Perspectives: Universal Adaptor for TIBCO Spotfire

Just this month, PerkinElmer announced a new partnership with Attivio. PerkinElmer Signals Perspectives, powered by Attivio, will bring this data discovery and content analytics platform to scientists and life science organizations worldwide. Now researchers can profile, identify, and unify semi-structured and unstructured data from disparate sources, leading to faster insight. The Universal Adaptor for data source discovery leverages the Attivio syntactical and contextual understanding to uncover relationships between data structures, to help analysts or scientists find the data that is relevant to them in minutes rather than weeks or months. It automatically searches the data landscape to deliver data in a format similar to e-commerce shopping carts and - like retail websites - can “recommend” the most relevant data given the context of the search. It correlates all structured and unstructured data, and enriches the data with scientifically-relevant dictionaries and ontologies. This unifies all data sources for easy analysis in PerkinElmer’s TIBCO® Spotfire visualization and analytics platform.

PerkinElmer Signals™ Perspectives: Content Analytics

For the next level of data analytics, content analytics makes sense of the rich materials and insights locked away in emails, publications, SharePoint sites, scientific posters, patents and research papers – all forms of unstructured or text-based content.

Imagine what discoveries await when the depths of this data is probed. Attivio asserts that “the benefits are game-changing” when companies look beyond structured data for their business intelligence initiatives. 

It’s time to bring full daylight to the vast forest of unstructured and semi-structured data. With PerkinElmer Signals™ Perspectives, powered by Attivio, we’ll help you clearly see all of your data.