Supervised non-parametric discretization based on Kernel density estimation
Jose Luis Flores, Borja Calvo, Aritz Perez
Pattern Recognition Letters
Nowadays, machine learning algorithms can be found in many applications where the classifiers play a key role. In this context, discretizing continuous attributes is a common step previous to classification tasks, the main goal being to retain as much discriminative information as possible. In this paper, we propose a supervised univariate non-parametric discretization algorithm which allows the use of a given supervised score criterion for selecting the best cut points. The candidate cut points are evaluated by computing the selected score value using kernel density estimation. The computational complexity of the proposed procedure is O(NlogN), where N is the length of the data. Our proposed algorithm generates a low complexity in discretization policies while retaining the discriminative information of the original continuous variables. In order to assess the validity of the proposed method, a set of real and artificial datasets has been used and the results show that the algorithm provides competitive results in terms of performance, a low complexity in the discretization policies and a high performance.
DOI / link: https://doi.org/10.1016/j.patrec.2019.10.016