Aim: One of the main difficulties about analyzing health expenditures is, the distribution of health expenditure is not normal and extremely positively skewed. This brings about overfitting problem and causes a decrease in regression model performance for predicting health expenditures. It is possible to use data mining based regression methods to improve classical regression model performances and overcome overfitting problem. Regression Tress, Random Forest Regression and Support Vector Regression are some of these methods. In this study it is aimed to compare prediction performances of different regression methods about predicting per capita health expenditures of member of total 214 World Bank countries. Materials and Methods: Before the analysis the distribution of health expenditure per capita normalized with using logarithmic and Box-Cox transformations. Multiple Linear Regression, Regression Tree, Random Forest Regression and Support Vector Machine Regression methods was used for prediction and R2, RMSE and MAE values are used for the assessment of prediction performances. Performance results are compared according to cross validation values determined by using different number of k parameters. Findings: Study findings show that prediction performance of Support Vector Regression is relatively higher compared with other regression methods when health expenditure per capita transformed by using Box-Cox transformation and when k parameter increases in cross validation. Results: Study results show that Support Vector Regression prediction performance is higher than other regression methods. It is advisable for future studies to examine Support Vector Regression performances using grid search methods which are one of hyperparameter optimization techniques.
Multiple Linear Regression; Regression Tree; Random Forest Regression; Support Vector Regression; Health Expenditure per capita . JEL Codes: C10, C88, H51.
Article Language: Turkish English