Analisis Perbandingan Algoritma XGBoost dan Algoritma Random Forest Ensemble Learning pada Klasifikasi Keputusan Kredit
DOI:
https://doi.org/10.55606/jurrimipa.v2i2.1470Keywords:
XGBoost, Random Forest, Klasifikasi, Keputusan Kredit, Parameter Tuning, Ensemble LearningAbstract
Pemberian kredit selalu memiliki risiko seperti kredit macet, sehingga pihak kreditur (bank) dituntut untuk lebih objektif dan akurat dalam mengevaluasi setiap permohonan kredit. Penelitian ini dilakukan guna menemukan algoritma mana yang paling akurat dalam memberikan suatu keputusan kredit, dengan melakukan perbandingan terhadap algoritma XGBoost dan algoritma Random Forest. Pada kedua algoritma digunakan data berukuran 10.000 dan 100.000 dengan 19 variabel yang relevan dalam pengambilan keputusan kartu kredit. Proses penelitian ini melibatkan pre-processing data, splitting data, training data, parameter tuning dengan Random Search, testing data, serta evaluasi model dengan confusion matrix. Hasil eksperimen menunjukkan bahwa kedua algoritma menghasilkan kinerja model yang cukup kompetitif, dimana XGBoost mampu mencapai 1.0 untuk semua metrik evaluasi baik pada data berukuran 10.000 maupun data berukuran 100.000. Random Forest sendiri berakurasi 0.998 untuk data berukuran 10.000 dan 0.999 untuk data berukuran 100.000. Akan tetapi, Random Forest hanya mampu mencapai F1-score sebesar 0.700 untuk data berukuran 10.000. Berdasarkan hasil yang diperoleh dalam penelitian ini, dapat disimpulkan bahwa kedua algoritma memiliki performa yang sangat baik dan akurat dalam mengklasifikasikan keputusan pada data kartu kredit. Namun, Random Forest kurang akurat bila digunakan pada data berukuran kecil yang tidak seimbang.
References
Arora, N., & Kaur, P. D. (2020). A Bolasso based consistent feature selection enabled random forest classification algorithm: An application to credit risk assessment. Applied Soft Computing Journal, 86, 1–28. https://doi.org/10.1016/j.asoc.2019.105936
Breiman, L. (2001). Random Forests. 45(1), 5–32.
Brown, I., & Mues, C. (2012). An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems with Applications, 39(3), 3446–3453. https://doi.org/10.1016/j.eswa.2011.09.033
Dangeti, P. (2017). Statistics for Machine Learning: Build supervised, unsupervised, and reinforcement learning models using both Python and R (Safis Editing (ed.)). Packt Publishing Ltd.
Deppalallo, H., Titaley, J., & Hatidja, D. (2020). Penerapan Algoritma Naïve Bayes Untuk Klasifikasi. IV, 127–140.
Fan, J., Wang, X., Wu, L., Zhou, H., Zhang, F., Yu, X., Lu, X., & Xiang, Y. (2018). Comparison of Support Vector Machine and Extreme Gradient Boosting for predicting daily global solar radiation using temperature and precipitation in humid subtropical climates: A case study in China. Energy Conversion and Management, 164, 102–111. https://doi.org/10.1016/j.enconman.2018.02.087
Ganti, V., & Sarma, A. Das. (2013). Data Cleaning: A Practical Perspective. In Synthesis Lectures on Data Management (Vol. 5, Issue 3). Morgan & Claypool. https://doi.org/10.2200/s00523ed1v01y201307dtm036
Herni Yulianti, S. E., Oni Soesanto, & Yuana Sukmawaty. (2022). Penerapan Metode Extreme Gradient Boosting (XGBOOST) pada Klasifikasi Nasabah Kartu Kredit. Journal of Mathematics Theory and Application, 4(1), 21–26. https://doi.org/10.31605/jomta.v4i1.1792
Huang, C. L., & Dun, J. F. (2008). A distributed PSO-SVM hybrid system with feature selection and parameter optimization. Applied Soft Computing Journal, 8(4), 1381–1391. https://doi.org/10.1016/j.asoc.2007.10.007
Jo, T. (2021). Machine Learning Foundations. In Machine Learning Foundations. Springer Nature Switzerland AG. https://doi.org/10.1007/978-3-030-65900-4
Lewis, H. G., & Brown, M. (2001). A generalized confusion matrix for assessing area estimates from remotely sensed data. International Journal of Remote Sensing, 22(16), 3223–3235. https://doi.org/10.1080/01431160152558332
Li, H., Cao, Y., Li, S., Zhao, J., & Sun, Y. (2020). XGBoost Model and Its Application to Personal Credit Evaluation. IEEE Intelligent Systems, 35(3), 1–8. https://doi.org/10.1109/MIS.2020.2972533
Li, Y., & Chen, W. (2020). A comparative performance assessment of ensemble learning for credit scoring. Mathematics, 8(10), 1–19. https://doi.org/10.3390/math8101756
Nguyen, K. A., Chen, W., Lin, B. S., & Seeboonruang, U. (2021). Comparison of Ensemble Machine Learning Methods for Soil Erosion Pin Measurements. ISPRS International Journal of Geo-Information, 10(1), 1–17. https://doi.org/10.3390/ijgi10010042
Poliker, R. (2012). Ensemble Machine Learning: Methods and Applications (C. Zhang & Y. Ma (eds.)). Springer Science+Business Media. https://doi.org/10.1007/978-1-4419-9326-7
Religia, Y., Pranoto, G. T., & Santosa, E. D. (2020). South German Credit Data Classification Using Random Forest Algorithm to Predict Bank Credit Receipts. JISA(Jurnal Informatika Dan Sains), 3(2), 62–66. https://doi.org/10.31326/jisa.v3i2.837
Roihan, A., Sunarya, P. A., & Rafika, A. S. (2020). Pemanfaatan Machine Learning dalam Berbagai Bidang: Review paper. IJCIT (Indonesian Journal on Computer and Information Technology), 5(1), 75–82. https://doi.org/10.31294/ijcit.v5i1.7951
Shmueli, G., Bruce, P. C., Yahav, I., Patel, N. R., & Lichtendahl, K. C. (2020). Data mining for Business Analytics: Concepts, Techniques, and Applications in R (3rd ed.). John Wiley & Sons, Inc.
Singh, D., & Singh, B. (2020). Investigating the impact of data normalization on classification performance. Applied Soft Computing, 97, 1–23. https://doi.org/10.1016/j.asoc.2019.105524
Steinki, O., & Mohammad, Z. (2015). Introduction to Ensemble Learning. SSRN Electronic Journal, 1(1), 1–9. https://doi.org/10.2139/ssrn.2634092
Tang, L., Cai, F., & Ouyang, Y. (2018). Applying a nonparametric random forest algorithm to assess the credit risk of the energy industry in China. Technological Forecasting and Social Change, 144, 1–10. https://doi.org/10.1016/j.techfore.2018.03.007
Wang, K., Li, M., Cheng, J., Zhou, X., & Li, G. (2021). Research on personal credit risk evaluation based on XGBoost. Procedia Computer Science, 199, 1128–1135. https://doi.org/10.1016/j.procs.2022.01.143
Wang, Y., Zhang, Y., Lu, Y., & Yu, X. (2020). A Comparative Assessment of Credit Risk Model Based on Machine Learning ——a case study of bank loan data. Procedia Computer Science, 174, 141–149. https://doi.org/10.1016/j.procs.2020.06.069
Wuest, T. (2015). Identifying Product and Process State Drivers in Manufacturing Systems Using Supervised Machine Learning (P. D.-I. Habil (ed.)). Springer Theses. https://doi.org/10.1007/978-3-319-17611-6
Xuan, S., Liu, G., Li, Z., Zheng, L., Wang, S., & Jiang, C. (2018). Random Forest for Credit Card Fraud Detection Shiyang. Procedia Computer Science, 4(1), 80–86.
Yu, L., Zhou, R., Tang, L., & Chen, R. (2018). A DBN-based resampling SVM ensemble learning paradigm for credit classification with imbalanced data. Applied Soft Computing Journal, 69, 192–202. https://doi.org/10.1016/j.asoc.2018.04.049
Yustanti, W., & Rochmawati, N. (2022). Analisis Algoritma Klasifikasi untuk Memprediksi Karakteristik Mahasiswa pada Pembelajaran Daring. JEPIN (Jurnal Edukasi Dan Penelitian Informatika), 8(1), 57–61.
Zhang, D., Qian, L., Mao, B., Huang, C., Huang, B., & Si, Y. (2018). A Data-Driven Design for Fault Detection of Wind Turbines Using Random Forests and XGboost. IEEE Access, 6, 21020–21031. https://doi.org/10.1109/ACCESS.2018.2818678
Zhang, W., Wu, C., Zhong, H., Li, Y., & Wang, L. (2021). Prediction of undrained shear strength using extreme gradient boosting and random forest based on Bayesian optimization. Geoscience Frontiers, 12(1), 469–477. https://doi.org/10.1016/j.gsf.2020.03.007
Zheng, H., Yuan, J., & Chen, L. (2017). Short-Term Load Forecasting Using EMD-LSTM neural networks with a xgboost algorithm for feature importance evaluation. Energies, 10(8). https://doi.org/10.3390/en10081168
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Jurnal Riset Rumpun Matematika dan Ilmu Pengetahuan Alam (JURRIMIPA)
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.