Reviews in Chemical Engineering, Vol.31, No.5, 453-490, 2015
Data cleaning in the process industries
In the past decades, process engineers are facing increasingly more data analytics challenges and having difficulties obtaining valuable information from a wealth of process variable data trends. The raw data of different formats stored in databases are not useful until they are cleaned and transformed. Generally, data cleaning consists of four steps: missing data imputation, outlier detection, noise removal, and time alignment and delay estimation. This paper discusses available data cleaning methods that can be used in data pre-processing and help overcome challenges of "Big Data".
Keywords:big data;data cleaning;knowledge discovery;missing data imputation;noise removal;outlier detection;time alignment and delay estimation