Abstract:In machine learning prediction models, imbalanced datasets reduce the accuracy of minority class predictions. A bi-random forest etiology prediction method based on K-Means clustering undersampling is proposed to address the imbalanced characteristics of the fever of unknown origin (FUO) dataset. Firstly, a balanced dataset is constructed through K-Means clustering undersampling, and a random forest prediction model based on the CART voting mechanism is created on this basis. Then, a random forest prediction model is also created using the same method for the initial dataset. Finally, two random forest prediction models are combined and their CART are used to vote together for prediction. The proposed method increases the number of CART, and enhances the voting weights of minority class while maintaining the characteristics of the original dataset. Experiments on FUO dataset show that the proposed method not only improves the prediction performance for minority class, but also improves the prediction performance for the other classes to a certain extent.