A joint algorithm by combined improved active learning and self-training
-
Graphical Abstract
-
Abstract
Aiming at the problem of high cost of manual labeling in large data sets and influence of mislabeled points in semi-supervised self-training algorithm, a joint algorithm of alternatively iterative training for active learning and semi-supervised self-training was proposed.In the training process, active learning algorithm was used for odd turns, self-training algorithm was used for even turns, alternatively iterative training of the two algorithms was used to make up for each other’s deficiency.The prediction of unlabeled samples by self-training algorithm alleviated the burden of active learning labeling samples.Samples labeled by active learning tended to become noisy, alleviating labeling errors in samples in the training process of self-training algorithm.An improved active learning algorithm based on density peaks clustering and membership degree was proposed also: the initial unlabeled samples were clustered, with some samples in each cluster selected for manual labeling according to difference of membership degree, to obtain balanced samples to embody the overall structure of samples.Performance of the proposed joint algorithm was found to be better than the two single algorithms.Compared with common active learning algorithms, classification performance of the improved active learning algorithm was significantly improved, and application in joint algorithm had more advantages.
-
-