ADVERTISEMENT

Home|Journals|Articles by Year|Audio Abstracts
 

Original Research

JEAS. 2026; 13(1): 33-45


Android Malware Detection Using CTGAN-Based Data Augmentation and Autoencoder-Driven Feature Extraction

Shirina Samreen.



Abstract
Download PDF Post

The rapid growth of Android applications has led to a significant increase in malware threats, making accurate and robust detection mechanisms essential for mobile security. However, challenges such as class imbalance and high-dimensional feature spaces limit the effectiveness of traditional machine learning approaches.

This work proposes a robust machine learning pipeline for accurate detection of Android malware by integrating generative data augmentation and deep feature extraction with classical classification models. We employ Conditional Tabular Generative Adversarial Networks (CTGAN) to synthetically balance a permission- and API-based feature dataset (TUANDROMD), developed at Tezpur University from real benign and malicious Android applications. An autoencoder is then utilized to learn compact and discriminative latent representations from the original 241 numerical features, effectively reducing dimensionality and redundancy. The extracted features are used to train multiple machine learning classifiers, including Logistic Regression, Random Forest, and XGBoost, enabling a comparative evaluation of model performance.

The models are assessed using accuracy, precision, recall, and F1-score under stratified validation and holdout testing. Four experimental configurations are investigated: (i) baseline classification using raw features, (ii) CTGAN-based data augmentation, (iii) autoencoder-based feature extraction, and (iv) CTGAN-based augmentation followed by autoencoder-driven feature extraction. Experimental results demonstrate that the combined CTGAN and autoencoder pipeline significantly improves minority-class detection while maintaining high overall accuracy. These findings highlight that integrating generative augmentation with learned feature representations is an effective strategy for handling high-dimensional, imbalanced Android malware datasets.

Key words: Android Malware Detection; CTGAN, Autoencoder; Data Augmentation; Feature Extraction; Ensemble Learning; Imbalanced Data







Bibliomed Article Statistics

2
R
E
A
D
S


D
O
W
N
L
O
A
D
S
06
2026

Full-text options


Share this Article


Online Article Submission
• ejmanager.com




ejPort - eJManager.com
Author Tools
About BiblioMed
License Information
Terms & Conditions
Privacy Policy
Contact Us

The articles in Bibliomed are open access articles licensed under Creative Commons Attribution 4.0 International License (CC BY), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.