Pendekatan Level Data untuk Menangani Ketidakseimbangan Data Menggunakan Algoritma K-Nearest Neighbor

Isi Artikel Utama

Resianta Perangin-angin
Eva Julia Gunawati Harianja
Indra Kelana Jaya

Abstrak

Dalam penelitian ini digunakan dataset yang memiliki tingkat ketidakseimbangan yang berbeda beda mulai dari 16.40, 8.60, 2.06, 2.78, 1.87, tentu hal ini dapat menurunkan kinerja algoritma klasifikasi. Secara umum ketidakseimbangan kelas dapat ditangani dengan dua pendekatan, yaitu level data dan level algoritma. Pendekatan level data ditujukan untuk memperbaiki keseimbangan kelas, sedangkan pendekatan level algoritma ditujukan untuk memperbaiki algoritma atau menggabungkan (ensemble) pengklasifikasi agar lebih konduktif terhadap kelas minoritas. Pada penelitian ini diusulkan pendekatan level data dengan resampling, yaitu random oversampling (ROS), dan random undersampling (RUS), Pengklasifikasi yang digunakan adalah k-near neighbors. Hasil penelitian menunjukkan bahwa model ROS+KNN dan RUS+KNN didapat dengan selisih G-Means sebesar 13% dan F-Measure 2,08%, dari, hal ini menunjutkan bahwa RUS+KNN dan ROS+KNN bisa meningkatkan akurasi dari G-Mean dan F-Measure namun tidak memiliki perbedaan yang signifikan.

Rincian Artikel

Cara Mengutip
[1]
R. Perangin-angin, E. J. G. Harianja, dan I. K. Jaya, “Pendekatan Level Data untuk Menangani Ketidakseimbangan Data Menggunakan Algoritma K-Nearest Neighbor”, JTM, vol. 9, no. 1, hlm. 22–32, Mei 2020.
Bagian
Articles
Biografi Penulis

Resianta Perangin-angin, Universitas Methodist Indonesia

Program Studi Komputerisasi Akuntansi

Referensi

A Hybrid Approach from Ant Colony Optimization ... (n.d.). 13.

Ali, A., Shamsuddin, S. M., & Ralescu, A. L. (n.d.). Classification with class imbalance problem: A review. 31.

Bolón-Canedo, V., Sánchez-Maroño, N., & Alonso-Betanzos, A. (2014). Data classification using an ensemble of filters. Neurocomputing, 135, 13–20. https://doi.org/10.1016/j.neucom.2013.03.067

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953

Chung, H.-Y., Ho, C.-H., & Hsu, C.-C. (2011). Support vector machines using Bayesian-based approach in the issue of unbalanced classifications. Expert Systems with Applications, 38(9), 11447–11452. https://doi.org/10.1016/j.eswa.2011.03.018

Cordón, I., García, S., Fernández, A., & Herrera, F. (2018). Imbalance: Oversampling algorithms for imbalanced classification in R. Knowledge-Based Systems, 161, 329–341. https://doi.org/10.1016/j.knosys.2018.07.035

Department of Biological Sciences, BITS PILANI K K Birla Goa Campus, Zuarinagar, Vasco Da Gama, India, & Kothandan, R. (2015). Handling class imbalance problem in miRNA dataset associated with cancer. Bioinformation, 11(1), 6–10. https://doi.org/10.6026/97320630011006

Douzas, G., & Bacao, F. (2019). Geometric SMOTE a geometrically enhanced drop-in replacement for SMOTE. Information Sciences, 501, 118–135. https://doi.org/10.1016/j.ins.2019.06.007

Douzas, G., Bacao, F., & Last, F. (2018). Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Information Sciences, 465, 1–20. https://doi.org/10.1016/j.ins.2018.06.056

Duan, L., Xie, M., Bai, T., & Wang, J. (2016). A new support vector data description method for machinery fault diagnosis with unbalanced datasets. Expert Systems with Applications, 64, 239–246. https://doi.org/10.1016/j.eswa.2016.07.039

Farquad, M. A. H., & Bose, I. (2012). Preprocessing unbalanced data using support vector machine. Decision Support Systems, 53(1), 226–233. https://doi.org/10.1016/j.dss.2012.01.016

Han, W., Huang, Z., Li, S., & Jia, Y. (2019). Distribution-Sensitive Unbalanced Data Oversampling Method for Medical Diagnosis. Journal of Medical Systems, 43(2). https://doi.org/10.1007/s10916-018-1154-8

Huang, W., & Fitzmaurice, G. M. (2005). Analysis of longitudinal data unbalanced over time. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(1), 135–155. https://doi.org/10.1111/j.1467-9868.2005.00492.x

Juan Carbajal-Hernández et al. - 2016—Classification of unbalance and misalignment in in.pdf. (n.d.).

Juan Carbajal-Hernández, J., Sánchez-Fernández, L. P., Hernández-Bautista, I., Medel-Juárez, J. de J., & Sánchez-Pérez, L. A. (2016). Classification of unbalance and misalignment in induction motors using orbital analysis and associative memories. Neurocomputing, 175, 838–850. https://doi.org/10.1016/j.neucom.2015.06.094

Khalilpour Darzi, M. R., Niaki, S. T. A., & Khedmati, M. (2019). Binary classification of imbalanced datasets: The case of CoIL challenge 2000. Expert Systems with Applications, 128, 169–186. https://doi.org/10.1016/j.eswa.2019.03.024

Lee, C.-Y., & Lee, Z.-J. (2012). A novel algorithm applied to classify unbalanced data. Applied Soft Computing, 12(8), 2481–2485. https://doi.org/10.1016/j.asoc.2012.03.051

Lee, J., Wu, Y., & Kim, H. (2015). Unbalanced data classification using support vector machines with active learning on scleroderma lung disease patterns. Journal of Applied Statistics, 42(3), 676–689. https://doi.org/10.1080/02664763.2014.978270

Maldonado, S., López, J., & Vairetti, C. (2019). An alternative SMOTE oversampling strategy for high-dimensional datasets. Applied Soft Computing, 76, 380–389. https://doi.org/10.1016/j.asoc.2018.12.024

Prusty, M. R., Jayanthi, T., & Velusamy, K. (2017). Weighted-SMOTE: A modification to SMOTE for event classification in sodium cooled fast reactors. Progress in Nuclear Energy, 100, 355–364. https://doi.org/10.1016/j.pnucene.2017.07.015

Qiong, G. (2016). An Improved SMOTE Algorithm Based on Genetic Algorithm for Imbalanced. 14(2), 12.

Raghuwanshi, B. S., & Shukla, S. (2019). SMOTE based class-specific extreme learning machine for imbalanced learning. Knowledge-Based Systems. https://doi.org/10.1016/j.knosys.2019.06.022

Rout, N., Mishra, D., & Mallick, M. K. (2018). Handling Imbalanced Data: A Survey. In M. S. Reddy, K. Viswanath, & S. P. K.M. (Eds.), International Proceedings on Advances in Soft Computing, Intelligent Systems and Applications (Vol. 628, pp. 431–443). Springer Singapore. https://doi.org/10.1007/978-981-10-5272-9_39

Searle, S. R. (1994). Analysis of Variance Computing Package Output for Unbalanced Data from Fixed-Effects Models with Nested Factors. The American Statistician, 48(2), 148. https://doi.org/10.2307/2684275

Sun, J., Lang, J., Fujita, H., & Li, H. (2018). Imbalanced enterprise credit evaluation with DTE-SBD: Decision tree ensemble based on SMOTE and bagging with differentiated sampling rates. Information Sciences, 425, 76–91. https://doi.org/10.1016/j.ins.2017.10.017

Sun, J., Li, H., Fujita, H., Fu, B., & Ai, W. (2020). Class-imbalanced dynamic financial distress prediction based on Adaboost-SVM ensemble combined with SMOTE and time weighting. Information Fusion, 54, 128–144. https://doi.org/10.1016/j.inffus.2019.07.006

Sun, Y., Wong, A. K. C., & Kamel, M. S. (2009). CLASSIFICATION OF IMBALANCED DATA: A REVIEW. International Journal of Pattern Recognition and Artificial Intelligence, 23(04), 687–719. https://doi.org/10.1142/S0218001409007326

Sundarkumar, G. G., & Ravi, V. (2015). A novel hybrid undersampling method for mining unbalanced datasets in banking and insurance. Engineering Applications of Artificial Intelligence, 37, 368–377. https://doi.org/10.1016/j.engappai.2014.09.019

Verbiest, N., Ramentol, E., Cornelis, C., & Herrera, F. (2014). Preprocessing noisy imbalanced datasets using SMOTE enhanced with fuzzy rough prototype selection. Applied Soft Computing, 22, 511–517. https://doi.org/10.1016/j.asoc.2014.05.023

Wang, K.-J., Makond, B., Chen, K.-H., & Wang, K.-M. (2014). A hybrid classifier combining SMOTE with PSO to estimate 5-year survivability of breast cancer patients. Applied Soft Computing, 20, 15–24. https://doi.org/10.1016/j.asoc.2013.09.014

Wu, Q., Ye, Y., Zhang, H., Ng, M. K., & Ho, S.-S. (2014). ForesTexter: An efficient random forest algorithm for imbalanced text categorization. Knowledge-Based Systems, 67, 105–116. https://doi.org/10.1016/j.knosys.2014.06.004

Zhang, L., Zhang, C., Gao, R., Yang, R., & Song, Q. (2016). Using the SMOTE technique and hybrid features to predict the types of ion channel-targeted conotoxins. Journal of Theoretical Biology, 403, 75–84. https://doi.org/10.1016/j.jtbi.2016.04.034