Are You Ready for Machine Learning?
Some users are so enthusiastic about Machine Learning (ML) that Gartner added it for the first time to its Hype Cycle for Emerging Technologies in 2016 based on its ability to revolutionize manufacturing and related industries. Forrester Research says enterprises are “enthralled” by the potential of ML “to uncover actionable new knowledge, predict customers’ wants and behaviors, and make smarter business decisions.”
It’s not just for manufacturing or retail, however. Computers that can “learn” have applications in the life sciences as well. Predictive modeling can, for example, assist with diagnostics - whether from clinical or molecular data, or from image recognition. A study at Stanford University showed that ML more accurately predicted outcomes for lung cancer patients than did pathologists.
What is Machine Learning?
Gartner defines it as “a technical discipline that provides computers with the ability to learn from data (observations) without being explicitly programmed. It facilitates the extraction of knowledge from data. Machine learning excels in solving complex, data-rich business problems where traditional approaches, such as human judgment and software engineering, increasingly fail.”
According to Tom Mitchell, author of Machine Learning (which in 1997 was one of the first textbooks on the subject), ML can be defined as any computer program that improves its performance (P) at some task (T) through experience (E).
It works, in part, by finding patterns in data. It has typically been the domain of proficient data scientists, software engineers, IT professionals, and others who understand deep learning, neural networks and similar specialized subjects. In short, ML has felt off-limits to many subject matter experts – think of those pathologists in the Stanford study – who could benefit from ML’s algorithms.
“The analytics revolution becomes real when the average person becomes more comfortable with advanced analytics,” says Dr. Jamie Powers, DrPH, Director of Real World Evidence and Data Science at PerkinElmer Informatics. “Not that it’s simple or easy, but the barrier to entry isn’t as high as one might think.”
And getting on board is important, says Dr. Powers, if organizations don’t want to be left behind.
“The insights that can be gained from statistical analysis and machine learning are far greater than what a pair of human eyes can possibly offer,” he says. “What machine learning can do is direct where the humans should focus.”
“Starting Small” with Machine Learning
The beauty of machine learning, compared to routine statistical prediction, for example, is in its ability to let users experiment. Even if “you’re not there yet” because you still struggle with data and basic analytics, you can start small and explore with ML.
For example, consider an analysis we ran using a publicly-available dataset (used in previous datamining competitions) from a Wisconsin breast cancer study. It’s a small dataset from 699 patients, and we ran several algorithms to test diagnostic accuracy (benign or malignant) based on nine tumor characteristics. The goal was to find an ML algorithm with greater than 95% predictive accuracy.
What tool did we use? The TIBCO Enterprise Runtime R (TERR) engine embedded within TIBCO Spotfire. Beyond its visualization and dashboarding capabilities, TIBCO Spotfire offers advanced analytics functionality for machine learning applications. TERR lets you work in Rstudio to develop, test, break, and iterate with algorithms – but it makes R run 10- to 100-times faster than open source R.
We tested seven different algorithms. Our results showed six of the seven machine learning algorithms yielded 95+% accuracy. While a small sample set, it’s enough to demonstrate that sometimes a simple model with high predictive accuracy is good enough.
If your next steps include trying ML on your TIBCO Spotfire and TERR installation, here are three tips and one caveat to get you started:
- 1. Follow a process. Machine Learning is about process to determine which algorithm to use, saving you from countless other decisions to make with a more manual evaluation of available data.
- 2. K-fold cross-validation is a best practice for training algorithms. Maintain 80/20 data split for final predictions, to have a hold-out sample to test the final algorithm.
- 3. Try ensembles – or combinations of algorithms to see if they deliver the accuracy and insights sought.
- 4. The caveat: Machine learning (or statistical prediction) is NOT a substitute for proper experimental design and/or sample selection.
From science fiction to real science, Machine Learning is finding new applications and new devotees who want to use its power to solve problems and answer seemingly unanswerable questions.
“If you have Spotfire and R, you can start with small projects,” Dr. Powers says. “The technology exists, the information is there. Machine learning is more accessible than it’s ever been. So, are you ready for machine learning? I would say anyone who’s willing to learn about it is ready.”
Watch Dr. Powers’ talk at Advanced Pharma Analytics 2016 to see how you can begin incorporating machine learning to benefit your organization.