Background: Yearly death rate is increasing due to heart disease. Major factors for the increasing death rate due to
heart disease are (a) misdiagnosed by the medical doctors or (b) ignorance by the patients. Heart diseases can be
described as any kind of disorder which affects the heart. Methods: The dataset of statlog from the UCI Machine
Learning with 270 patients related to heart disease isused in this article. The dataset comprises attributes of patients
diagnosed with heart diseases. The diagnosis was used to confirm whether heart disease is present or absent in the
patient. The present article aims to identify the risk factors/variables which influence this diagnosis. Classification is a
very important part of the disease diagnosis but it is also relevant to identify the risk factors/variables. Two classification
techniques namely Support Vector Machines (SVM), Multi-Layer Perceptrons ensembles (MLPE) and one advanced
regression technique,Generalized additive model (GAM) with binomial distribution andlogit link have been introduced
for diagnosis and risk factors/variables identification. Results: GAM explains 65% deviance with adjusted R square value
0.70 approximately. Sensitivity analysis has been performed under SVM, which is the best model for this dataset with
approximately 85% classification accuracy rate. MLPE gives 82% classification accuracy rate approximately.Maximum
heart rate, vessel, old peak, chest pain, thallium scan are the most important factors/variables find through both sensitivity
analysis under SVM and GAM. Conclusion: The present article attempt to remove some new information regarding
heart disease through probabilistic modeling which may provide better assistance for treatment decision making using
the individual patient risk factors and the benefits of a specific treatment. These findings may help the medical practitioners
for better medical treatment.
Key words: Heart disease, Data Mining, SVM, MLPE, Sensitivity analysis, GAM.
|