The proliferation of interconnected network infrastructures and IoT devices has significantly expanded the cyber-attack surface, requiring efficient Machine Learning-based Intrusion Detection Systems (IDS). Although reference datasets like UNSW-NB15 exist, their official features impose limitations regarding flexibility and class imbalance. This study evaluates the impact of a custom data representation by constructing a new dataset from the original UNSW-NB15 PCAP files. We implemented a workflow to label packets, group unidirectional flows, and extract a reduced set of 21 features, comparing this representation with the official 49-feature UNSW-NB15 set using different ML architectures in binary and multi-class classification tasks. Results indicate that the custom dataset achieves competitive performance despite a significant reduction in file size and the number of features. Notably, the custom representation effectively balances detection accuracy with computational efficiency, offering a viable strategy for environments with strict operational constraints, such as edge nodes or IoT gateways.
Key words: Intrusion detection systems (IDS), Network traffic classification, UNSW-NB15, Machine learning, Network security.
|