April 8, 2022
Conference Paper

WinnowML: Stable feature selection for maximizing prediction accuracy of time-based system modeling

Abstract

Online deep learning (ODL) has become an important methodology for modeling time-based performance of computer systems. An open problem is the intelligent selection of features from raw workload traces of computer systems. The best methods are overly sensitive to noisy data, causing frequent feature changes and re-training. Using all available features inflates training time and introduces model artifacts if some features should have been dropped. We present WinnowML, a method for automatically determining the most relevant feature subset for a predictive time-series model. WinnowML combines existing feature ranking algorithms and a history of each feature's ranking to iteratively rank a feature set to lower prediction error and maximize long term relevance. From this ranked feature set, the most relevant and stable subset is selected to train a model. Experimentally, we show how WinnowML can lower a model's mean absolute relative error up to 42% on average compared to the closest performing approach. Additionally, we lower the fluctuation in feature ranking and selection up to 65%. We also demonstrate how to combine WinnowML and a model search tool to provide improvements in performance of up to 14.5% when compared to using all the feature available.

Published: April 8, 2022

Citation

Bel O., S. Mukhopadhyay, N.R. Tallent, F. Faisal Nawab, and D. Long. 2021. WinnowML: Stable feature selection for maximizing prediction accuracy of time-based system modeling. In IEEE International Conference on Big Data (Big Data 2021), December 15-18, 2022, Orlando, FL, edited by Y. Chen, et al, 3031-3041. Piscataway, New Jersey:IEEE. PNNL-SA-168493. doi:10.1109/BigData52589.2021.9671602