Process Safety and Environmental Protection, Vol.140, 68-78, 2020
The potential of new ensemble machine learning models for effluent quality parameters prediction and related uncertainty
Accurate simulation of wastewater effluent parameters is a vital concern to reduce the operational costs of a wastewater treatment plant. In this way, a reliable predictive model is a necessity to achieve an acceptable performance. This study represents a novel approach to predict the effluent quality parameters for an industrial wastewater treatment plant in Qom province, Iran. Three new ensem-ble machine learning models called Ada Boost Regression (ABR), Gradient Boost Regression (GBR) and Random Forest Regression (RFR) are used to predict the effluent quality parameters including Total Dissolved Solids (TDS), five-day Biochemical Oxygen Demand (BOD5), and Chemical Oxygen Demand (COD) in daily scale. The gamma test technique is used to obtain the optimistic predictive variables. The performance accuracy of the predictive models is assessed based on several metrics indices and visual performance indicators. Results show that the ABR model provides the most performance for predicting the TDS (CC = 0.962 , RMSE = 30.3 mg/l) while the GBR offers a better accuracy to simulate the BOD5 (CC = 0.9 , RMSE = 4.6mg/l) and COD (CC = 0.75 , RMSE = 9.6mg/l) parameters. The findings obtained from uncertainty analysis indicate that the prediction results are more sensitive to model structure (R - factor(TDS) = 0.52, R - factor(BOD) = 0.89 and R - factor(COD) = 1.06) than the input variables (R - factor(TDS) = 0.21, R - factor(BOD) = 0.67 and R - factor(COD) = 0.62). (C) 2020 Institution of Chemical Engineers. Published by Elsevier B.V. All rights reserved.
Keywords:Ensemble machine learning model;Effluent quality parameter;Prediction;Gamma test;Uncertainty