Non-parametric discretization for probabilistic labeled data
Flores J.L., Calvo B., Pérez A.
Pattern Recognition Letters
01/09/2022
Probabilistic label learning is a challenging task that arises from recent real-world problems within the weakly supervised classification framework. In this task algorithms have to deal with datasets where each instance has associated a set of probabilities belonging to different class labels. In this paper, we propose a supervised univariate non-parametric discretization algorithm based on kernel density estimation that can deal with probabilistic labeled data. The algorithm takes advantage of the estimation of the class conditional densities to produce different sets of cut points according to different smoothing parameters of the kernel. Then, the best set of cut points is selected according to a given supervised classification performance measure. The computational complexity is O(NlogN), where N is the number of instances. The proposal is tested on simulated probabilistic labeled data, which allows assessing the behavior with different noise degrees. The results show that the algorithm outperforms other discretization algorithms and is robust to different degrees of uncertainty.