OBESITY RISK CLASSIFICATION BASED ON GRADIENT BOOSTING USING MEDICAL DATA

Authors

  • Achmad Lukman Prayogidianto Universitas Pembangunan Nasional “Veteran” Jawa Timur Author

DOI:

https://doi.org/10.5281/zenodo.20309761

Keywords:

Gradient Boosting, Classification, Obesity Risk, Medical Data, Machine Learning

Abstract

Obesity is a growing global health issue that significantly contributes to the development of chronic diseases such as diabetes, cardiovascular disorders, and metabolic syndromes, making early detection essential to support preventive healthcare strategies. This study aims to implement the Gradient Boosting algorithm to classify obesity risk based on medical data obtained from the UCI Machine Learning Repository, which includes information on eating habits, physical activity, and individual characteristics. The research methodology involves several stages, including data preprocessing, transformation, normalization, class mapping, and dataset splitting into training and testing sets with a ratio of 70:30. The Gradient Boosting model is constructed using multiple decision trees in an iterative manner to improve classification performance, categorizing individuals into obese and non-obese classes. Model evaluation is conducted using accuracy, precision, recall, and F1-score metrics. The experimental results indicate that the model achieves an accuracy of over 90%, with a relatively small gap between training and testing performance, demonstrating good generalization capability without overfitting. These findings confirm that Gradient Boosting is an effective approach for obesity risk classification and has strong potential to support intelligent healthcare systems in enabling data-driven decision-making for early prevention and treatment.

Downloads

Download data is not yet available.

References

N. R. Aditya, Data Mining. Bandung, Indonesia: UNIKOM, 2015. [Online]. Available: https://repository.unikom.ac.id/47451/1/Pertemuan%203%20-%20Materi%20%5BDM%20-%202015%5D.pdf

[2] A. R. Arrahimi, M. K. Ihsan, D. Kartini, M. R. Faisal, and F. Indriani, "Teknik Bagging dan Boosting Pada Algoritma CART Untuk Klasifikasi Masa Studi Mahasiswa," Jurnal Sains dan Informatika, vol. 5, Jun. 2019. [Online]. Available: https://jsi.politala.ac.id/index.php/JSI/article/view/171/94

[3] M. Azhari, Z. Situmorang, and R. Rosnelly, "Perbandingan Akurasi, Recall, dan Presisi Klasfikasi pada Algoritma C4.5, Random Forest, SVM, dan Naive Bayes," Jurnal Media Informatika Budidarma, vol. 5, pp. 640–651, 2021. [Online]. Available: https://ejurnal.stmik-budidarma.ac.id/index.php/mib/article/view/2937

[4] EKRUT, "Dataset Adalah: Pengertian, Tipe, Perbedaan dengan Database, dan 10 Web Penyedia," Sep. 28, 2022. [Online]. Available: https://www.ekrut.com/media/dataset-adalah

[5] Direktorat Pengendalian Penyakit Tidak Menular, Pedoman Umum Pengendalian Obesitas. Jakarta, Indonesia: Kementerian Kesehatan RI, 2015. [Online]. Available: https://extranet.who.int/ncdccs/Data/IDN_B11_Buku%20Obesitas-1.pdf

[6] J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed. Waltham, MA, USA: Elsevier, 2011.

[7] F. P. Mendoza and A. de la H. Manotas, "Dataset for estimation of obesity levels based on eating habits and physical condition in individuals from Colombia, Peru and Mexico," Data in Brief, vol. 25, p. 104344, 2019, doi: 10.1016/j.dib.2019.104344.

[8] R. Arthana, "Mengenal Accuracy, Precision, Recall dan Specificity serta yang diprioritaskan," Apr. 5, 2019. [Online]. Available: https://rey1024.medium.com/mengenal-accuracy-precission-recall-dan-specificity-serta-yang-diprioritaskan-b79ff4d77de8

[9] P. Pujiastuti, "Obesitas dan Penyakit Periodontal," Stomatognatic (J.K.G Unej), vol. 9, pp. 82–85,2012.[Online].Available:https://jurnal.unej.ac.id/index.php/STOMA/article/download/2112/1715

[10] A. Roihan, P. A. Sunarya, and A. S. Rafika, "Pemanfaatan Machine Learning dalam Berbagai Bidang: Review Paper," IJCIT, vol. 5, pp. 75–82, Apr. 2020. [Online]. Available: https://ejournal.bsi.ac.id/ejurnal/index.php/ijcit/article/view/7951/pdf

[11] R. Tibshirani, J. Friedman, and T. Hastie, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. New York, NY, USA: Springer, 2009.

[12] I. F. Yulianti and P. R. Sihombing, "Penerapan Metode Machine Learning dalam Klasifikasi Risiko Kejadian Berat Badan Lahir Rendah di Indonesia," Matrik: Jurnal Manajemen, Teknik Informatika, dan Rekayasa Komputer, vol. 20, pp. 417–426, May 2021. [Online]. Available: https://journal.universitasbumigora.ac.id/index.php/matrik/article/view/1174/703

[13] Gregorius Airlangga, “Machine Learning-Based Obesity Classification: A Comparative Study Using Self-Reported Survey Data and Ensemble Learning Models,” Jurnal Teknologi Informatika dan Komputer, vol. 11 no. 1, 2025.

[14] Shalini K., A.V.K. Shanthi, C. Shakila, N. Chamudeeswari, “Machine Learning Approaches For Obesity Level Classification,” International Journal of Environmental Sciences, vol. 11, 2025.

[15] Achmad Rivai Syahputra, Rian Hidayat, Fathur Rismansyah, dkk., “Komparasi Algoritma Machine Learning (SVM, Random Forest, dan Regresi Logistik) untuk Prediksi Tingkat Obesitas,” Jurnal Ilmiah Teknik Informatika dan Komunikasi, vol. 5 no. 3, 2025

Downloads

Published

2026-05-20