Home|Journals|Articles by Year|Audio Abstracts
 

Research Article



Heterogeneous Distributed Ensemble Feature Selection: An Enhancement Approach to Machine Learning for Phishing Detection

B.M. Olukoya, G.O. Ogunleye, P.O. Olabisi, A.T. Olusesi and A.A. Osobukola.




Abstract

Phishing is a critical challenge in cybersecurity at present due to the high rate of technological development to conduct the act. The detection of phishing attacks is a difficult task as the methods for executing keep evolving every time, which makes it tedious. Despite several techniques deployed to fight the attacks, there is no one perfect solution. Presently, machine learning is accepted among researchers as the right antidote to fight against phishing attacks on the network. This method comprises several steps, but one crucial step is the feature selection. The quality of the features selected in building the machine learning model plays a significant role. The two general feature selection approaches were found with loopholes such as the challenge of choosing a cutoff point and high computation. To address the issue of the cutoff point, the study applied a novel ensemble feature selection strategy to identify relevant features while correlated ones were discarded. The study used a Borda count algorithm as the aggregator to improve the selection performance of the individual filter-based measures. In the first phase of the feature selection framework, three individual filter-based predictors: gain ratio, chi-square, and correlation, were applied to produce the features based on their principles. In the second stage, the innovative HDEFS was later applied to the primary information features. The innovative HDEFS produced baseline webpage features different from normal features such as IpAddress, AtSymbol, QueryLength, MissingTitle, NumQueryComponents previously used for phishing detection. From the results gathered, it was observed that the phishing detection models using the proposed HDEFS baseline features enhanced the individual filter-based identifiers. The findings showed that the prediction accuracy of the models increased using the features selected by the novel feature selection framework proposed. The bagged SVM model outperformed other ensembled and classical models achieving 0.974(97.4%), followed by bagged LR (0.94).

Key words: Keywords: Phishing Detection, Cybersecurity, Machine learning, Feature Selection, Ensemble, Malicious, email






Full-text options


Share this Article


Online Article Submission
• ejmanager.com




ejPort - eJManager.com
Refer & Earn
JournalList
About BiblioMed
License Information
Terms & Conditions
Privacy Policy
Contact Us

The articles in Bibliomed are open access articles licensed under Creative Commons Attribution 4.0 International License (CC BY), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.