IMPROVING CYBERSECURITY TRAFFIC ANALYSIS VIA ENHANCED K-MEANS CLUSTERING WITH TRIANGLE INEQUALITY-BASED INITIALIZATION

Isi Artikel Utama

Hartono Hartono
Muhammad Khahfi Zuhanda
Sayuti Rahman

Abstrak

Clustering algorithms are essential in data mining and pattern recognition for grouping unlabeled data into meaningful clusters based on similarities. Among them, K-Means is widely used due to its simplicity and efficiency but suffers from sensitivity to initial centroid selection and inability to capture feature dependencies. This study proposes an Enhanced Mutual Information-based K-Means (MIK-Means) algorithm combined with Triangle Inequality and Lower Bound (TILB) seeding to improve clustering accuracy and computational efficiency, particularly in the context of network traffic classification for cybersecurity applications. The TILB method accelerates the initialization phase by reducing redundant distance calculations using mathematical pruning techniques, thereby selecting well-distributed initial centroids efficiently. Meanwhile, MIK-Means incorporates mutual information as a similarity measure during clustering assignment, enabling the algorithm to capture complex statistical dependencies among features, which traditional Euclidean distance metrics fail to address. The combination of these two approaches results in a robust clustering framework capable of handling large-scale, high-dimensional, and noisy datasets commonly found in network intrusion detection. The proposed method was evaluated on several benchmark datasets including Darpa 1998-99, KDD Cup99, NSL-KDD, UNSW-NB15, and CAIDA. Comparative experiments with state-of-the-art algorithms such as K-Means++, K-NNDP, and DI-K-Means showed that the proposed approach consistently outperformed or matched competitors in terms of Silhouette Coefficient, Calinski-Harabasz index, and Davies-Bouldin index, indicating better cluster cohesion, separation, and compactness. Additionally, the computational efficiency gained from TILB seeding facilitates faster convergence without compromising clustering quality. Furthermore, a threshold-based cluster labeling mechanism was applied to translate clustering results into practical classifications for detecting attacks versus normal traffic, enhancing the usability of the method in real-world cybersecurity systems. Overall, this research demonstrates that the integration of TILB seeding and mutual information-based clustering provides an effective and efficient solution for network traffic classification challenges.

Rincian Artikel

Cara Mengutip
[1]
H. Hartono, M. Khahfi Zuhanda, dan S. Rahman, “IMPROVING CYBERSECURITY TRAFFIC ANALYSIS VIA ENHANCED K-MEANS CLUSTERING WITH TRIANGLE INEQUALITY-BASED INITIALIZATION”, JTM, vol. 14, no. 1, hlm. 60–69, Jun 2025.
Bagian
Articles

Referensi

[1] J. Guo, Z. Zhu, Y. Gao, and X. Gao, “A new similarity in clustering through users’ interest and social relationship,” Theoretical Computer Science, vol. 1019, p. 114833, Dec. 2024, doi: 10.1016/j.tcs.2024.114833.

[2] D. I. Tselentis and E. Papadimitriou, “Time-series clustering for pattern recognition of speed and heart rate while driving: A magnifying lens on the seconds around harsh events,” Transportation Research Part F: Traffic Psychology and Behaviour, vol. 98, pp. 254–268, Oct. 2023, doi: 10.1016/j.trf.2023.09.010.

[3] A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, “K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,” Information Sciences, vol. 622, pp. 178–210, Apr. 2023, doi: 10.1016/j.ins.2022.11.139.

[4] T. Ragunthar, P. Ashok, N. Gopinath, and M. Subashini, “A strong reinforcement parallel implementation of k-means algorithm using message passing interface,” Materials Today: Proceedings, vol. 46, pp. 3799–3802, Jan. 2021, doi: 10.1016/j.matpr.2021.02.032.

[5] A. Fahim, “K and starting means for k-means algorithm,” Journal of Computational Science, vol. 55, p. 101445, Oct. 2021, doi: 10.1016/j.jocs.2021.101445.

[6] H. Hu, J. Liu, X. Zhang, and M. Fang, “An Effective and Adaptable K-means Algorithm for Big Data Cluster Analysis,” Pattern Recognition, vol. 139, p. 109404, Jul. 2023, doi: 10.1016/j.patcog.2023.109404.

[7] S. Manochandar, M. Punniyamoorthy, and R. K. Jeyachitra, “Development of new seed with modified validity measures for k-means clustering,” Computers & Industrial Engineering, vol. 141, p. 106290, Mar. 2020, doi: 10.1016/j.cie.2020.106290.

[8] M. Gagolewski, A. Cena, M. Bartoszuk, and Ł. Brzozowski, “Clustering with Minimum Spanning Trees: How Good Can It Be?,” J Classif, vol. 42, no. 1, pp. 90–112, Mar. 2025, doi: 10.1007/s00357-024-09483-1.

[9] B. Sadeghi, “Clustering in geo-data science: Navigating uncertainty to select the most reliable method,” Ore Geology Reviews, vol. 181, p. 106591, Jun. 2025, doi: 10.1016/j.oregeorev.2025.106591.

[10] F. Wang, L. Li, and Z. Liu, “Stratification-based semi-supervised clustering algorithm for arbitrary shaped datasets,” Information Sciences, vol. 639, p. 119004, Aug. 2023, doi: 10.1016/j.ins.2023.119004.

[11] H. Anahideh, J. Rosenberger, and V. Chen, “High-dimensional black-box optimization under uncertainty,” Computers & Operations Research, vol. 137, p. 105444, Jan. 2022, doi: 10.1016/j.cor.2021.105444.

[12] X. Yang and F. Xiao, “An improved density peaks clustering algorithm based on the generalized neighbors similarity,” Engineering Applications of Artificial Intelligence, vol. 136, p. 108883, Oct. 2024, doi: 10.1016/j.engappai.2024.108883.

[13] S. Manochandar, M. Punniyamoorthy, and R. K. Jeyachitra, “Development of new seed with modified validity measures for k-means clustering,” Computers & Industrial Engineering, vol. 141, p. 106290, Mar. 2020, doi: 10.1016/j.cie.2020.106290.

[14] H. Zhang and J. Li, “Towards faster seeding for k-means++ via lower bound and triangle inequality,” Neurocomputing, vol. 639, p. 130227, Jul. 2025, doi: 10.1016/j.neucom.2025.130227.

[15] H. Qian and L. Cai, “Improved K-means-based solution for detecting DDoS attacks in SDN,” Physical Communication, vol. 64, p. 102318, Jun. 2024, doi: 10.1016/j.phycom.2024.102318.

[16] Y. Chen, P. Tan, M. Li, H. Yin, and R. Tang, “K-means clustering method based on nearest-neighbor density matrix for customer electricity behavior analysis,” International Journal of Electrical Power & Energy Systems, vol. 161, p. 110165, Oct. 2024, doi: 10.1016/j.ijepes.2024.110165.

[17] H. Ahmetoglu and R. Das, “A comprehensive review on detection of cyber-attacks: Data sets, methods, challenges, and future research directions,” Internet of Things, vol. 20, p. 100615, Nov. 2022, doi: 10.1016/j.iot.2022.100615.

[18] J. Liao, X. Wu, Y. Wu, and J. Shu, “K-NNDP: K-means algorithm based on nearest neighbor density peak optimization and outlier removal,” Knowledge-Based Systems, vol. 294, p. 111742, Jun. 2024, doi: 10.1016/j.knosys.2024.111742.

[19] A. Kumar, A. Kumar, R. Mallipeddi, and D.-G. Lee, “High-density cluster core-based k-means clustering with an unknown number of clusters,” Applied Soft Computing, vol. 155, p. 111419, Apr. 2024, doi: 10.1016/j.asoc.2024.111419.