Predicting Employment Status 6 Months After Graduation with Machine Learning Learning : A Comparative Study of 3,945 Indonesian Graduates
DOI:
https://doi.org/10.57255/intellect.v4i02.1392Keywords:
Machine Learning, Random Forest, XGBoost, Logistic Regression, Employability, SHAP AnalysisAbstract
The high unemployment rate of undergraduate graduates in Indonesia, reaching 11.4% in the first six months after graduation, indicates the need for an early prediction system to identify factors that influence student employability. This study aims to analyze and compare the performance of three machine learning algorithms (Random Forest, Logistic Regression, and XGBoost) to predict employment status 6 months after graduation based on academic and socioeconomic data. The dataset consists of 3,945 graduates from universities in Padangsidimpuan with variables of study program, study duration, GPA, gender, and parental income. The operational target is employment status 6 months after graduation (binary: employed = 1, not yet = 0) with the proportion of employed classes: 48.2 %, not yet: 51.8%. Evaluation uses stratified 5- fold cross-validation with accuracy metrics, balanced accuracy, F1- macro, ROC-AUC, and PR-AUC. Model interpretability is analyzed using permutation importance and SHAP values. Random Forest achieved the best performance with F1- macro 0.524±0.015, ROC-AUC 0.567±0.012, followed by Logistic Regression (F1- macro : 0.511±0.018) and XGBoost (F1- macro : 0.506±0.020). The majority baseline achieved an accuracy of 51.8 %. Permutation importance analysis identified GPA as the most influential factor (importance : 0.082), followed by parental income (0.067) and duration of study (0.041). The machine learning model provided a moderate improvement compared to the majority baseline. GPA and socioeconomic factors were shown to significantly influence graduates' employment status. These findings can support the development of an early warning system for data-based student mentoring.
Abstrak
Tingginya tingkat pengangguran lulusan sarjana di Indonesia mencapai 11.4% dalam enam bulan pertama pasca kelulusan menunjukkan perlunya sistem prediksi dini untuk mengidentifikasi faktor-faktor yang mempengaruhi employability mahasiswa. Penelitian ini bertujuan menganalisis dan membandingkan performa tiga algoritma machine learning (Random Forest, Logistic Regression, dan XGBoost) untuk memprediksi status kerja 6 bulan pascawisuda berdasarkan data akademik dan sosial ekonomi. Dataset terdiri dari 3.945 data lulusan dari universitas di Padangsidimpuan dengan variabel program studi, durasi studi, IPK, jenis kelamin, dan penghasilan orang tua. Target operasional adalah status kerja 6 bulan pascawisuda (biner: bekerja=1, belum=0) dengan proporsi kelas bekerja:48.2%, belum:51.8%. Evaluasi menggunakan stratified 5-fold cross-validation dengan metrik akurasi, balanced accuracy, F1-macro, ROC-AUC, dan PR-AUC. Interpretabilitas model dianalisis menggunakan permutation importance dan SHAP values. Random Forest mencapai performa terbaik dengan F1-macro 0.524±0.015, ROC-AUC 0.567±0,012, diikuti Logistic Regression (F1-macro: 0.511±0,018) dan XGBoost (F1-macro: 0.506±0.020). Baseline mayoritas mencapai akurasi 51,8%. Analisis permutation importance mengidentifikasi IPK sebagai faktor paling berpengaruh (importance: 0.082), diikuti penghasilan orang tua (0.067) dan durasi studi (0.041). Model machine learning memberikan peningkatan moderat dibanding baseline mayoritas. IPK dan faktor sosial ekonomi terbukti berpengaruh signifikan terhadap status kerja lulusan. Temuan ini dapat mendukung pengembangan sistem early warning untuk pendampingan mahasiswa berbasis data.
Downloads
References
A. Bai and S. Hira, 'An intelligent hybrid deep belief network model for predicting students employability', Soft comput, vol. 25, no. 14, pp. 9241–9254, Jul. 2021, doi: 10.1007/s00500-021-05850-x.
Muhammad Hadiza Baffa, Muhammad Abubakar Miyim, and Abdullahi Sani Dauda, 'Machine Learning for Predicting Students' Employability', UMYU Scientifica, vol. 2, no. 1, pp. 001–009, Feb. 2023, doi: 10.56919/usci.2123_001.
H. Gemilang and K. Muslim Lhaksmana, 'Prediction of Student Job Readiness Using MLP and XGBoost Method', in 2024 International Conference on Data Science and Its Applications (ICoDSA), IEEE, Jul. 2024, pp. 248–253. doi: 10.1109/ICoDSA62899.2024.10652042.
E. Ahmed, 'Student Performance Prediction Using Machine Learning Algorithms', Applied Computational Intelligence and Soft Computing, vol. 2024, no. 1, Jan. 2024, doi: 10.1155/2024/4067721.
JI Venegas-Muggli, C. Cifuentes-Donald, M. Rozas-Retamal, and MJ González-Clares, 'Determining factors of labor market outcomes for recently graduated, underrepresented college students', Australian Journal of Career Development, vol. 30, no. 2, pp. 150–162, Jul. 2021, doi: 10.1177/10384162211012016.
LH Alamri, RS Almuslim, MS Alotibi, DK Alkadi, I. Ullah Khan, and N. Aslam, 'Predicting Student Academic Performance using Support Vector Machine and Random Forest', in 2020 3rd International Conference on Education Technology Management, New York, NY, USA: ACM, Dec. 2020, pp. 100–107. doi : 10.1145/3446590.3446607.
C. Usala, I. Sulis, and M. Porcu, 'Exploring the impact of high schools, socioeconomic factors, and degree programs on higher education success in Italy', Big Data Research, vol. 41, p. 100539, Aug. 2025, doi: 10.1016/j.bdr.2025.100539.
D. Contini and R. Zotti, 'Do Financial Conditions Play a Role in University Dropout? New Evidence from Administrative Data', in Teaching, Research and Academic Careers, Cham: Springer International Publishing, 2022, pp. 39–70. doi: 10.1007/978-3-031-07438-7_3.
Md. Mazid-Ul-Haque, MS Ullah Miah, A. Bhowmik, SMA Shafi, and NM Noor, 'Predictive Analysis of Internship and Job Placement Success in Computer Science Education', in 2024 IEEE International Conference on Computing, Applications and Systems (COMPAS), IEEE, Sep. 2024, pp. 1–8. doi: 10.1109/COMPAS60761.2024.10796641.
EA Johnson, JA Inyangetoh, HA Rahmon, TG Jimoh, EE Dan, and MO Esang, 'An Intelligent Analytic Framework for Predicting Students Academic Performance Using Multiple Linear Regression and Random Forest', European Journal of Computer Science and Information Technology, vol. 12, no. 3, pp. 56–70, March. 2024, doi: 10.37745/ejcsit.2013/vol12n35670.
D. Indrahadi and A. Wardana, 'The impact of sociodemographic factors on academic achievements among high school students in Indonesia', International Journal of Evaluation and Research in Education (IJERE), vol. 9, no. 4, p. 1114, Dec. 2020, doi: 10.11591/ijere.v9i4.20572.
FF Abdulloh, M. Rahardi, A. Aminuddin, SD Anggita, and AYA Nugraha, 'Observation of Imbalance Tracer Study Data for Graduates Employability Prediction in Indonesia', International Journal of Advanced Computer Science and Applications, vol. 13, no. 8, 2022, doi: 10.14569/IJACSA.2022.0130820.
M. TR, VK V, DK V, O. Geman, M. Margala, and M. Guduri, 'The stratified K-folds cross-validation and class-balancing methods with high-performance ensemble classifiers for breast cancer classification', Healthcare Analytics, vol. 4, p. 100247, Dec. 2023, doi: 10.1016/j.health.2023.100247.
L.-S. Chen, T.-T. Huynh-Cam, V.-C. Nguyen, T.-C. Lu, and D.-K. Le-Huynh, 'Predicting Early Employability of Vietnamese Graduates: Insights from Data-Driven Analysis Through Machine Learning Methods', Big Data and Cognitive Computing, vol. 9, no. 5, p. 134, May 2025, doi: 10.3390/bdcc9050134.
IS Ritonga, A. Candra, and MA Budiman, 'Utilization of K-Means Clustering to Examine Library User Segmentation's Impact on Student Graduation Rates', in 2024 2nd International Conference on Technology Innovation and Its Applications (ICTIIA), IEEE, Sep. 2024, pp. 1–6. doi: 10.1109/ICTIIA61827.2024.10761813.
L. Hu, 'Research on English Achievement Analysis Based on Improved CARMA Algorithm', Comput Intell Neurosci, vol. 2022, Jan. 2022, doi: 10.1155/2022/8687879.
B. Jahan, BU Mahmud, A. Al Mamun, Md. Mujibur Rahman Majumder, and M. Alam, 'Impact Analysis of Harassment Against Women in Bangladesh Using Machine Learning Approaches', 2022. doi: 10.1007/978-981-33-4597-3_50.
S. Pradeepa, J.P. Srinivasan, R. Anandalakshmi, P. Subbulakshmi, S. Vimal, and A. Tarik, 'FREEDOM: Effective Surveillance and Investigation of Water-borne Diseases from Data-centric Networking Using Machine Learning Techniques', International Journal on Artificial Intelligence Tools, vol. 31, no. 05, Aug. 2022, doi: 10.1142/S021821302250004X.
M. Khurana, A. Thakur, P. Kantha, C.-S. Shieh, and RK Shukla, Eds., Machine Learning Algorithms, vol. 2238. Cham: Springer Nature Switzerland, 2025. doi: 10.1007/978-3-031-75861-4.
L. Breiman, 'Random Forests', Mach Learn, vol. 45, no. 1, pp. 5–32, Oct. 2001, doi: 10.1023/A:1010933404324.
P. Fang et al., 'The Classification Performance and Mechanism of Machine Learning Algorithms in Winter Wheat Mapping Using Sentinel-2 10 m Resolution Imagery', Applied Sciences, vol. 10, no. 15, p. 5075, Jul. 2020, doi: 10.3390/app10155075.
M. Yoosefzadeh-Najafabadi, H. J. Earl, D. Tulpan, J. Sulik, and M. Eskandari, 'Application of Machine Learning Algorithms in Plant Breeding: Predicting Yield From Hyperspectral Reflectance in Soybean', Front Plant Sci, vol. 11, Jan. 2021, doi: 10.3389/fpls.2020.624273.
S. Kaur, A. Abdullah, NN Hairi, and SK Sivanesan, 'Logistic Regression Modeling to Predict Sarcopenia Frailty among Aging Adults', International Journal of Advanced Computer Science and Applications, vol. 12, no. 8, 2021, doi: 10.14569/IJACSA.2021.0120858.
J. Sulistiawan, M. Moslehpour, P.-C. Lin, and P.-K. Lin, 'Employability Paradox, Movement Capital and Employees Turnover In Indonesia', in 2021 7th International Conference on E-Business and Applications, New York, NY, USA: ACM, Feb. 2021, pp. 141–146. doi : 10.1145/3457640.3457647.
A. Soelistiyono and C. Feijuan, 'A Literature Review of Labor Absorption Level of Vocational High School Graduates in Indonesia', 2021. doi: 10.2991/assehr.k.211223.155.
M. Letnar and K. Širok, 'The Role of Social Capital in Employability Models: A Systematic Review and Suggestions for Future Research', Sustainability, vol. 17, no. 5, p. 1782, Feb. 2025, doi: 10.3390/su17051782.
G. Molnár and Á. Kocsis, 'Cognitive and non-cognitive predictors of academic success in higher education: a large-scale longitudinal study', Studies in Higher Education, vol. 49, no. 9, pp. 1610–1624, Sept. 2024, doi: 10.1080/03075079.2023.2271513.
H. Gül, B. Lelonek-Kuleta, and N. Männikkö, 'A brief overview of the relationship between academic achievement and problematic internet use of adolescents and young adults: What are the main mediators?', Front Educ (Lausanne), vol. 7, Nov. 2022, doi: 10.3389/feduc.2022.978589.
J. Munir, M. Faiza, B. Jamal, S. Daud, and K. Iqbal, 'The Impact of Socio-economic Status on Academic Achievement', Journal of Social Sciences Review, vol. 3, no. 2, pp. 695–705, Jun. 2023, doi: 10.54183/jssr.v3i2.308.
Y. Yan and X. Gai, 'High Achievers from Low Socioeconomic Status Families: Protective Factors for Academically Resilient Students', Int J Environ Res Public Health, vol. 19, no. 23, p. 15882, Nov. 2022, doi: 10.3390/ijerph192315882.
S. Alturki, L. Cohausz, and H. Stuckenschmidt, 'Predicting Master's students' academic performance: an empirical study in Germany', Smart Learning Environments, vol. 9, no. 1, p. 38, Dec. 2022, doi: 10.1186/s40561-022-00220-y.
S. He, M. Yousefpoori-Naeim, Y. Cui, and M. Cutumisu, 'Predicting College Enrollment for Low-Socioeconomic-Status Students Using Machine Learning Approaches', Big Data and Cognitive Computing, vol. 9, no. 4, p. 99, Apr. 2025, doi: 10.3390/bdcc9040099.
Downloads
Submitted
Accepted
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 ihdi Syahputra Ritonga

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.















