K-Means Clustering to Classify Indonesian Provinces Based on School Participation and Socio-Economic Indicators

Authors

  • Nilam Novita Sari Universitas Negeri Jakarta
  • Khaola Rachma Adzima Universitas Negeri Jakarta
  • Sahiba Sahila Politeknik Negeri Jakarta
  • Tiara Husnul Khotimah Universitas Negeri Jakarta

DOI:

https://doi.org/10.55606/jurrimipa.v4i2.6657

Keywords:

Davies-Bouldin index, K-Means clustering, School participation, Socio-economic indicator, Validation indices

Abstract

Education serves as a fundamental pillar in national development, as it not only enhances individual capacities but also improves overall social welfare. Despite this crucial role, Indonesia continues to face disparities in both access to and quality of education among its regions, as can be seen from variations in school participation indicators and socio-economic backgrounds. To analyze these differences, this study applied the K-Means Clustering method to categorize provinces in Indonesia using six variables: School Participation Rate, Net Enrollment Rate, Gross Enrollment Rate, Poverty Rate, High School Ratio, and Teacher Ratio. To identify the most suitable number of clusters, three validation indices were utilized, namely Dunn Index, C-Index, and Davies-Bouldin Index, with cluster counts tested from three to six. The results indicated that the best clustering solution was five clusters, as reflected in the highest Dunn Index (0.1569), lowest C-Index (0.0321), and lowest Davies-Bouldin Index (0.5062). The robustness of this clustering was further supported by the ratio between within-cluster and between-cluster standard deviation (S(w)/S(b) = 0.33). Each cluster revealed unique characteristics of education and socio-economic conditions, where Cluster 4 displayed the most favorable outcomes with high participation and low poverty levels, whereas Cluster 5 highlighted the weakest performance across all observed indicators.

Downloads

Download data is not yet available.

References

Abou-Moustafa, K. (2016). What is the distance between objects in a data set? A brief review of distance and similarity measures for data analysis. IEEE Pulse, 7(2), 41–47. https://doi.org/10.1109/MPUL.2015.2513727

Badan Pusat Statistik. (2024). Indonesia statistics 2024 (Vol. 52). Statistics Indonesia. https://www.bps.go.id/id/publication/2024/02/28/c1bacde03256343b2bf769b0/statistik-indonesia-2024.html

Baharuddin, & Burhan. (2025). Urban and rural teacher perspectives on Indonesian educational reform: Challenges and policy implications. Cogent Education, 12(1), 2497142. https://doi.org/10.1080/2331186X.2025.2497142

Batool, S. M., & Liu, Z. (2021). Exploring the relationships between socioeconomic indicators and student enrollment in higher education institutions of Pakistan. PLOS ONE, 16(12), e0261577. https://doi.org/10.1371/journal.pone.0261577

Chigbu, B. I., & Nekhwevha, F. H. (2021). High school training outcome and academic performance of first-year tertiary institution learners: Taking “Input-Environment-Outcomes model” into account. Heliyon, 7(7), e07700. https://doi.org/10.1016/j.heliyon.2021.e07700

Chong, B. (2021). K-means clustering algorithm: A brief review. Academic Journal of Computer and Information Sciences, 4(5), 37–40. https://doi.org/10.25236/ajcis.2021.040506

Fernandes, A. A. R., Solimun, Efendi, E. C. L., Badung, N. M. A. A., & Krisnawati, E. (2022). Cluster analysis study on various cluster validity indexes with various linkages and Euclidean distance (Study on compliant paying behavior of Bank X customers in Indonesia 2021). Journal of Statistics Applications & Probability, 11(3), 875–882. https://doi.org/10.18576/jsap/110311

Haleem, A., Javaid, M., Qadri, M. A., & Suman, R. (2022). Understanding the role of digital technologies in education: A review. Sustainable Operations and Computers, 3, 275–285. https://doi.org/10.1016/j.susoc.2022.05.004

Jie, C., Jiyue, Z., Junhui, W., Yusheng, W., Huiping, S., & Kaiyan, L. (2020). Review on the research of K-means clustering algorithm in big data. In 2020 IEEE 3rd International Conference on Electronics and Communication Engineering (ICECE) (pp. 107–111). IEEE. https://doi.org/10.1109/ICECE51594.2020.9353036

Muhajir, M., & Sari, N. N. (2018). K-Affinity Propagation (K-AP) and K-means clustering for classification of earthquakes in Indonesia. In 2018 International Symposium on Advanced Intelligent Informatics (SAIN) (pp. 6–10). IEEE. https://doi.org/10.1109/SAIN.2018.8673344

Munna, A. S., & Kalam, M. A. (2021). Teaching and learning process to enhance teaching effectiveness: Literature review. International Journal of Humanities Innovation, 4(1), 1–4. https://doi.org/10.33750/ijhi.v4i1.102

Nguyen, H. T. M., Bui, N. A., Ngo, N. T. H., & Luong, T. Q. (2024). Surviving and thriving: Voices from teachers in remote and disadvantaged regions of Vietnam. Asia Pacific Journal of Education, 00(00), 1–16. https://doi.org/10.1080/02188791.2024.2336246

Novianti, P., Setyorini, D., & Rafflesia, U. (2017). K-means cluster analysis in earthquake epicenter clustering. International Journal of Advanced Intelligent Informatics, 3(2), 81–89. https://doi.org/10.26555/ijain.v3i2.100

Schröder, S. M., & Kiko, R. (2022). Assessing representation learning and clustering algorithms for computer-assisted image annotation: Simulating and benchmarking MorphoCluster. Sensors, 22(7), 2775. https://doi.org/10.3390/s22072775

Sinaga, K. P., & Yang, M. S. (2020). Unsupervised K-means clustering algorithm. IEEE Access, 8, 80716–80727. https://doi.org/10.1109/ACCESS.2020.2988796

Suraya, S., Sholeh, M., & Lestari, U. (2023). Evaluation of data clustering accuracy using K-Means algorithm. International Journal of Multidisciplinary Approach in Research and Science, 2(1), 385–396. https://doi.org/10.59653/ijmars.v2i01.504

Tabianan, K., Velu, S., & Ravi, V. (2022). K-Means clustering approach for intelligent customer segmentation using customer purchase behavior data. Sustainability, 14(12), 7243. https://doi.org/10.3390/su14127243

Turienzo, J. (2024). A transversal and practical education as a business success factor: Literature review of learning process of basic design through ICT tools. Journal of Management and Business Education, 7(1), 70–89. https://doi.org/10.35564/jmbe.2024.0005

Xu, Q. (2024). The impact of new media technology applications on educational equity in rural areas. Education Journal, 13(5), 284–293. https://doi.org/10.11648/j.edu.20241305.15

Downloads

Published

2025-08-31

How to Cite

Nilam Novita Sari, Khaola Rachma Adzima, Sahiba Sahila, & Tiara Husnul Khotimah. (2025). K-Means Clustering to Classify Indonesian Provinces Based on School Participation and Socio-Economic Indicators. JURNAL RISET RUMPUN MATEMATIKA DAN ILMU PENGETAHUAN ALAM, 4(2), 292–304. https://doi.org/10.55606/jurrimipa.v4i2.6657