Abstract:Based on the automobile credit default data of a financial institution, a random forest risk prediction model is constructed. The principal component analysis method is used to reduce the dimensions of the data, and the method of up sampling was used to solve the problem of sample imbalance. The random forest model parameters were adjusted by integrating the 50 fold cross validation method and grid search. In addition, the prediction results are compared with those of other machine learning algorithms. The research shows that, compared with the other two prediction models, the performance of random forest is optimal and better. At the same time, when using stochastic forests to calculate the importance of characteristics, the value of personal mortgage assets has a significant impact on automobile credit default.