Analisis Perbandingan Algoritma XGBoost dan Algoritma Random Forest Ensemble Learning pada Klasifikasi Keputusan Kredit

Authors

  • Jan Melvin Ayu Soraya Dachi Universitas Negeri Medan
  • Pardomuan Sitompul Universitas Negeri Medan

DOI:

https://doi.org/10.55606/jurrimipa.v2i2.1470

Keywords:

XGBoost, Random Forest, Klasifikasi, Keputusan Kredit, Parameter Tuning, Ensemble Learning

Abstract

Pemberian kredit selalu memiliki risiko seperti kredit macet, sehingga pihak kreditur (bank) dituntut untuk lebih objektif dan akurat dalam mengevaluasi setiap permohonan kredit. Penelitian ini dilakukan guna menemukan algoritma mana yang paling akurat dalam memberikan suatu keputusan kredit, dengan melakukan perbandingan terhadap algoritma XGBoost dan algoritma Random Forest. Pada kedua algoritma digunakan data berukuran 10.000 dan 100.000 dengan 19 variabel yang relevan dalam pengambilan keputusan kartu kredit. Proses penelitian ini melibatkan pre-processing data, splitting data, training data, parameter tuning dengan Random Search, testing data, serta evaluasi model dengan confusion matrix. Hasil eksperimen menunjukkan bahwa kedua algoritma menghasilkan kinerja model yang cukup kompetitif, dimana XGBoost mampu mencapai 1.0 untuk semua metrik evaluasi baik pada data berukuran 10.000 maupun data berukuran 100.000. Random Forest sendiri berakurasi 0.998 untuk data berukuran 10.000 dan 0.999 untuk data berukuran 100.000. Akan tetapi, Random Forest hanya mampu mencapai F1-score sebesar 0.700 untuk data berukuran 10.000. Berdasarkan hasil yang diperoleh dalam penelitian ini, dapat disimpulkan bahwa kedua algoritma memiliki performa yang sangat baik dan akurat dalam mengklasifikasikan keputusan pada data kartu kredit. Namun, Random Forest kurang akurat bila digunakan pada data berukuran kecil yang tidak seimbang.

References

Arora, N., & Kaur, P. D. (2020). A Bolasso based consistent feature selection enabled random forest classification algorithm: An application to credit risk assessment. Applied Soft Computing Journal, 86, 1–28. https://doi.org/10.1016/j.asoc.2019.105936

Breiman, L. (2001). Random Forests. 45(1), 5–32.

Brown, I., & Mues, C. (2012). An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems with Applications, 39(3), 3446–3453. https://doi.org/10.1016/j.eswa.2011.09.033

Dangeti, P. (2017). Statistics for Machine Learning: Build supervised, unsupervised, and reinforcement learning models using both Python and R (Safis Editing (ed.)). Packt Publishing Ltd.

Deppalallo, H., Titaley, J., & Hatidja, D. (2020). Penerapan Algoritma Naïve Bayes Untuk Klasifikasi. IV, 127–140.

Fan, J., Wang, X., Wu, L., Zhou, H., Zhang, F., Yu, X., Lu, X., & Xiang, Y. (2018). Comparison of Support Vector Machine and Extreme Gradient Boosting for predicting daily global solar radiation using temperature and precipitation in humid subtropical climates: A case study in China. Energy Conversion and Management, 164, 102–111. https://doi.org/10.1016/j.enconman.2018.02.087

Ganti, V., & Sarma, A. Das. (2013). Data Cleaning: A Practical Perspective. In Synthesis Lectures on Data Management (Vol. 5, Issue 3). Morgan & Claypool. https://doi.org/10.2200/s00523ed1v01y201307dtm036

Herni Yulianti, S. E., Oni Soesanto, & Yuana Sukmawaty. (2022). Penerapan Metode Extreme Gradient Boosting (XGBOOST) pada Klasifikasi Nasabah Kartu Kredit. Journal of Mathematics Theory and Application, 4(1), 21–26. https://doi.org/10.31605/jomta.v4i1.1792

Huang, C. L., & Dun, J. F. (2008). A distributed PSO-SVM hybrid system with feature selection and parameter optimization. Applied Soft Computing Journal, 8(4), 1381–1391. https://doi.org/10.1016/j.asoc.2007.10.007

Jo, T. (2021). Machine Learning Foundations. In Machine Learning Foundations. Springer Nature Switzerland AG. https://doi.org/10.1007/978-3-030-65900-4

Lewis, H. G., & Brown, M. (2001). A generalized confusion matrix for assessing area estimates from remotely sensed data. International Journal of Remote Sensing, 22(16), 3223–3235. https://doi.org/10.1080/01431160152558332

Li, H., Cao, Y., Li, S., Zhao, J., & Sun, Y. (2020). XGBoost Model and Its Application to Personal Credit Evaluation. IEEE Intelligent Systems, 35(3), 1–8. https://doi.org/10.1109/MIS.2020.2972533

Li, Y., & Chen, W. (2020). A comparative performance assessment of ensemble learning for credit scoring. Mathematics, 8(10), 1–19. https://doi.org/10.3390/math8101756

Nguyen, K. A., Chen, W., Lin, B. S., & Seeboonruang, U. (2021). Comparison of Ensemble Machine Learning Methods for Soil Erosion Pin Measurements. ISPRS International Journal of Geo-Information, 10(1), 1–17. https://doi.org/10.3390/ijgi10010042

Poliker, R. (2012). Ensemble Machine Learning: Methods and Applications (C. Zhang & Y. Ma (eds.)). Springer Science+Business Media. https://doi.org/10.1007/978-1-4419-9326-7

Religia, Y., Pranoto, G. T., & Santosa, E. D. (2020). South German Credit Data Classification Using Random Forest Algorithm to Predict Bank Credit Receipts. JISA(Jurnal Informatika Dan Sains), 3(2), 62–66. https://doi.org/10.31326/jisa.v3i2.837

Roihan, A., Sunarya, P. A., & Rafika, A. S. (2020). Pemanfaatan Machine Learning dalam Berbagai Bidang: Review paper. IJCIT (Indonesian Journal on Computer and Information Technology), 5(1), 75–82. https://doi.org/10.31294/ijcit.v5i1.7951

Shmueli, G., Bruce, P. C., Yahav, I., Patel, N. R., & Lichtendahl, K. C. (2020). Data mining for Business Analytics: Concepts, Techniques, and Applications in R (3rd ed.). John Wiley & Sons, Inc.

Singh, D., & Singh, B. (2020). Investigating the impact of data normalization on classification performance. Applied Soft Computing, 97, 1–23. https://doi.org/10.1016/j.asoc.2019.105524

Steinki, O., & Mohammad, Z. (2015). Introduction to Ensemble Learning. SSRN Electronic Journal, 1(1), 1–9. https://doi.org/10.2139/ssrn.2634092

Tang, L., Cai, F., & Ouyang, Y. (2018). Applying a nonparametric random forest algorithm to assess the credit risk of the energy industry in China. Technological Forecasting and Social Change, 144, 1–10. https://doi.org/10.1016/j.techfore.2018.03.007

Wang, K., Li, M., Cheng, J., Zhou, X., & Li, G. (2021). Research on personal credit risk evaluation based on XGBoost. Procedia Computer Science, 199, 1128–1135. https://doi.org/10.1016/j.procs.2022.01.143

Wang, Y., Zhang, Y., Lu, Y., & Yu, X. (2020). A Comparative Assessment of Credit Risk Model Based on Machine Learning ——a case study of bank loan data. Procedia Computer Science, 174, 141–149. https://doi.org/10.1016/j.procs.2020.06.069

Wuest, T. (2015). Identifying Product and Process State Drivers in Manufacturing Systems Using Supervised Machine Learning (P. D.-I. Habil (ed.)). Springer Theses. https://doi.org/10.1007/978-3-319-17611-6

Xuan, S., Liu, G., Li, Z., Zheng, L., Wang, S., & Jiang, C. (2018). Random Forest for Credit Card Fraud Detection Shiyang. Procedia Computer Science, 4(1), 80–86.

Yu, L., Zhou, R., Tang, L., & Chen, R. (2018). A DBN-based resampling SVM ensemble learning paradigm for credit classification with imbalanced data. Applied Soft Computing Journal, 69, 192–202. https://doi.org/10.1016/j.asoc.2018.04.049

Yustanti, W., & Rochmawati, N. (2022). Analisis Algoritma Klasifikasi untuk Memprediksi Karakteristik Mahasiswa pada Pembelajaran Daring. JEPIN (Jurnal Edukasi Dan Penelitian Informatika), 8(1), 57–61.

Zhang, D., Qian, L., Mao, B., Huang, C., Huang, B., & Si, Y. (2018). A Data-Driven Design for Fault Detection of Wind Turbines Using Random Forests and XGboost. IEEE Access, 6, 21020–21031. https://doi.org/10.1109/ACCESS.2018.2818678

Zhang, W., Wu, C., Zhong, H., Li, Y., & Wang, L. (2021). Prediction of undrained shear strength using extreme gradient boosting and random forest based on Bayesian optimization. Geoscience Frontiers, 12(1), 469–477. https://doi.org/10.1016/j.gsf.2020.03.007

Zheng, H., Yuan, J., & Chen, L. (2017). Short-Term Load Forecasting Using EMD-LSTM neural networks with a xgboost algorithm for feature importance evaluation. Energies, 10(8). https://doi.org/10.3390/en10081168

Downloads

Published

2023-10-30

How to Cite

Jan Melvin Ayu Soraya Dachi, & Pardomuan Sitompul. (2023). Analisis Perbandingan Algoritma XGBoost dan Algoritma Random Forest Ensemble Learning pada Klasifikasi Keputusan Kredit. JURNAL RISET RUMPUN MATEMATIKA DAN ILMU PENGETAHUAN ALAM, 2(2), 87–103. https://doi.org/10.55606/jurrimipa.v2i2.1470