Data cleaning in the process industries

Xu S; Lu B; Baldea M; Edgar TF; Wojsznis W; Blevins T; Nixon M

Reviews in Chemical Engineering, Vol.31, No.5, 453-490, 2015

DOI10.1515/revce-2015-0022 Export Citation

Data cleaning in the process industries

Xu S, Lu B, Baldea M, Edgar TF, Wojsznis W, Blevins T, Nixon M

In the past decades, process engineers are facing increasingly more data analytics challenges and having difficulties obtaining valuable information from a wealth of process variable data trends. The raw data of different formats stored in databases are not useful until they are cleaned and transformed. Generally, data cleaning consists of four steps: missing data imputation, outlier detection, noise removal, and time alignment and delay estimation. This paper discusses available data cleaning methods that can be used in data pre-processing and help overcome challenges of "Big Data".

Keywords:big data;data cleaning;knowledge discovery;missing data imputation;noise removal;outlier detection;time alignment and delay estimation