KLASIFIKASI TINGKAT NILAI MATEMATIKA SISWA MENGGUNAKAN METODE MACHINE LEARNING BERBASIS FAKTOR SOSIAL DAN PERILAKU

Authors

  • Razpa Arya Wardana Universitas Pembangunan Nasional “Veteran” Jawa Timur Author

Keywords:

Student Performance, Mathematics Achievement, Machine Learning, Random Forest, Educational Data Mining

Abstract

The use of data in education is increasingly important to support data-driven decision making, particularly in monitoring students’ learning progress in mathematics. This study aims to classify students’ mathematics achievement levels based on social, behavioral, and academic factors using machine learning methods. The dataset used is the Student Performance Data Set from the UCI Machine Learning Repository, specifically the student-mat.csv file, which contains 395 student records with 33 attributes. The final mathematics grade (G3) is grouped into three categories: low, medium, and high. The research methodology follows the CRISP-DM approach, which includes the stages of Business Understanding, Data Understanding, Data Preparation, Modelling, Evaluation, and Deployment. Model development is carried out using Logistic Regression, Random Forest, and XGBoost algorithms. The evaluation results show that the Random Forest algorithm achieves the best performance, with an accuracy of 0.5570 and an F1-score of 0.4975, outperforming the other algorithms. Feature analysis indicates that prior academic failures, school support, social activities, and students’ absenteeism levels have a significant influence on mathematics achievement. This study is expected to help schools identify at-risk students earlier and support the planning of more targeted learning interventions.

Downloads

Download data is not yet available.

References

[1]. C. Romero and S. Ventura, “Educational data mining and learning analytics: An updated survey,” WIREs Data Mining and Knowledge Discovery, vol. 10, no. 3, e1355, 2020, doi: 10.1002/widm.1355.

[2]. P. Cortez and A. M. G. Silva, “Using data mining to predict secondary school student performance,” in Proc. 5th Annual Future Business Technology Conf. (FUBUTEC 2008), Porto, Portugal, Apr. 9–11, 2008, pp. 5–12.

[3]. D. Dua and C. Graff, “UCI Machine Learning Repository,” Univ. of California, Irvine, School of Information and Computer Sciences, 2019.

[4]. D. R. Cox, “The regression analysis of binary sequences,” J. Roy. Stat. Soc. Ser. B, vol. 20, no. 2, pp. 215–242, 1958.

[5]. L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001, doi: 10.1023/A:1010933404324.

[6]. T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD), 2016, pp. 785–794, doi: 10.1145/2939672.2939785.

Downloads

Published

2026-02-03