화학공학소재연구정보센터
Journal of Process Control, Vol.68, 1-13, 2018
Big data quality prediction in the process industry: A distributed parallel modeling framework
With the ever increasing data collected from the process, the era of big data has arrived in the process industry. Therefore, the computational effort for data modeling and analytics in standalone modes has become increasingly demanding, particularly for large-scale processes. In this paper, a distributed parallel process modeling approach is presented based on a MapReduce framework for big data quality prediction. Firstly, the architecture for distributed parallel data modeling is formulated under the MapReduce framework. Secondly, a big data quality prediction scheme is developed based on the distributed parallel data modeling approach. As an example, the basic Semi-Supervised Probabilistic Principal Component Regression (SSPPCR) model is deployed to concurrently train a set of local models with split datasets. Meanwhile, Bayesian rule is utilized in a MapReduce way to integrate local models based on their predictive abilities. Two case studies demonstrate the effectiveness of the proposed method for big data quality prediction. (C) 2018 Elsevier Ltd. All rights reserved.