Feature selection methods for multiphase reactors data classification

Tarca LA; Grandjean BPA; Larachi F

Industrial & Engineering Chemistry Research, Vol.44, No.4, 1073-1084, 2005

The design of reliable data-driven classifiers able to predict flow regimes in trickle beds or bed initial behavior (contraction/expansion) in three-phase fluidized beds requires as a first step the identification of a restrained number of salient variables among all the numerous available features. Reduction of dimensionality of the feature space is urged by the fact that lesser training samples may be required and/or more reliable estimates for the classifier parameters may be achieved and/or improvement in accuracy can be achieved. This work investigates several methodologies to identify the relevant features in two classification problems belonging to a multiphase reactor context. Relevance of the subsets was assessed using mutual information between the subsets and the class variable (filter approach) and by the accuracy rate of a one-nearest neighbor classifier (wrapper approach). Algorithms for generating feasible sets to maximize these relevance criteria that were investigated were the sequential forward selection and the plus-l-take away r. Another conceptually different method to feature ranking that was tested was based on the Garson's saliency indices derived from the weights of classification neural networks. Reliability of the feature selection methodologies was first evaluated on two benchmark problems (a synthetic problem and the Anderson's iris data). They were henceforth applied to the two multiphase reactors classification problems with the goal of identifying the most appropriate features subsets to be used into classifiers. Finally, a new feature selection algorithm which combines filter and wrapper techniques proved to yield the same solutions as the wrapper technique while being less computationally expensive.