Aim: Breast cancer remains a leading cause of mortality among women worldwide, with early detection being crucial for improving patient prognosis. However, existing diagnostic methods often lack the required predictive precision and interpretability. This study proposes a two-stage machine learning approach to address these challenges in breast cancer diagnostics.
Methods: In the first stage, Random Forest, Logistic Regression, Mutual Information, and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) algorithms are employed for feature extraction and selection. SVM-RFE effectively reduced data dimensionality while retaining critical predictive features, slightly outperforming the other techniques. In the second stage, Logistic Regression is used to model the probability of breast cancer occurrence from the refined feature set. The combined approach improved breast cancer classification performance and risk assessment probabilistic modeling.
Results: Evaluation using standard metrics such as accuracy, precision, recall, and F1-score validated the model's performance, demonstrating high accuracy (97%) in model prediction and 99.8% in classification confidence. Feature importance analysis revealed that radius_se significantly influenced prediction outcomes and remained the strongest indicator of probable malignancy, while fractal_dimension_se was identified as the major feature for classifying benign tumors.
Conclusion: These results highlight the potential of the proposed model in enhancing the reliability of breast cancer detection, paving the way for clinical applications in early diagnosis and risk stratification.
Key words: Breast Cancer, Risk Factors, Machine Learning, Early Detection
|