RANCANG BANGUN QUESTION ANSWERING SYSTEM LAYANAN INFORMASI AKADEMIK BERBASIS RAG

Authors

  • Muhammad Surya Adhi Setiawan Universitas Pembangunan Nasional “Veteran” Jawa Timur Author

Keywords:

Information Retrieval, Prototype, RAG Architecture, Software Engineering, System Implementation

Abstract

The complexity of academic information in the New Student Admissions (PPMB) process often overwhelms conventional helpdesk services. This study aims to design and build a prototype Question Answering (QA) System based on Retrieval-Augmented Generation (RAG) that can automate information services accurately. The system is built using a three-layer architecture: Presentation Layer (Gradio UI), Application Layer (Python/LangChain), and Data Layer (ChromaDB). A key focus of the development is the data pipeline strategy, specifically handling "Indivisible Information Units" in PDF tables by setting a dynamic chunking limit of 3000 tokens. The prototype features a Knowledge Base Manager for dynamic document updates and a multilingual Chat Interface. Testing demonstrates the system's ability to process heterogeneous data from 30 sources and successfully retrieve specific procedural information, such as the "Golden Ticket" requirements, with high precision. The system is deployed using a reasoning model engine to ensure logical answer synthesis.

Downloads

Download data is not yet available.

References

Aunul, S., Handayani, F., & Riswandi, R. (2022). Uncertainty Reduction of First-Year College Students in Virtual Class. CHANNEL: Jurnal Komunikasi, 10(1), 21–26. https://doi.org/10.12928/channel.v10i1.22088

Bauer-Wolf, J. (2023, Agustus 25). Over half of students rank college applications as their most stressful academic experience, survey finds | Higher Ed Dive. https://www.highereddive.com/news/over-half-of-students-rank-college-applications-as-their-most-stressful-aca/691808/

Donghun Shin, Xigui Li, Li, H., Shaojie Shi, Kaitao Chen, & Daocheng Fu. (2024). Prompt Engineering and Format on LLMs in the Financial Domain. https://doi.org/10.13140/RG.2.2.17057.11365

Firdaus, D., Sumardi, I., & Kulsum, Y. (2024). Integrating Retrieval-Augmented Generation with Large Language Model Mistral 7b for Indonesian Medical Herb. JISKA (Jurnal Informatika Sunan Kalijaga), 9(3), Article 3. https://doi.org/10.14421/jiska.2024.9.3.230-243

Gupta, S., Ranjan, R., & Singh, S. N. (2024). A Comprehensive Survey of Retrieval-Augmented Generation (RAG): Evolution, Current Landscape and Future Directions (arXiv:2410.12837). arXiv. https://doi.org/10.48550/arXiv.2410.12837

Guu, K., Lee, K., Tung, Z., Pasupat, P., & Chang, M.-W. (2020). REALM: Retrieval-augmented language model pre-training. Proceedings of the 37th International Conference on Machine Learning, ICML’20, 119, 3929–3938. https://dl.acm.org/doi/10.5555/3524938.3525306

Jongbloed, B., Vossensteyn, H., van Vught, F., & Westerheijden, D. F. (2018). Transparency in Higher Education: The Emergence of a New Perspective on Higher Education Governance. Dalam A. Curaj, L. Deca, & R. Pricopie (Ed.), European Higher Education Area: The Impact of Past and Future Policies (hlm. 441–454). Springer International Publishing. https://doi.org/10.1007/978-3-319-77407-7_27

Kane, P. (2025, Februari 5). Access the latest 2.0 experimental models in the Gemini app. Google. https://blog.google/feed/gemini-app-experimental-models/

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems, 33, 9459–9474. https://proceedings.neurips.cc/paper_files/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html

Mahdavi, N., Tapak, L., Darvishi, E., Doosti-Irani, A., & Shafiee Motlagh, M. (2024). Unraveling the interplay between mental workload, occupational fatigue, physiological responses and cognitive performance in office workers. Scientific Reports, 14, 17866. https://doi.org/10.1038/s41598-024-68889-4

Ni, B., Liu, Z., Wang, L., Lei, Y., Zhao, Y., Cheng, X., Zeng, Q., Dong, L., Xia, Y., Kenthapadi, K., Rossi, R., Dernoncourt, F., Tanjim, M. M., Ahmed, N., Liu, X., Fan, W., Blasch, E., Wang, Y., Jiang, M., & Derr, T. (2025). Towards Trustworthy Retrieval Augmented Generation for Large Language Models: A Survey (arXiv:2502.06872). arXiv. https://doi.org/10.48550/arXiv.2502.06872

Noyes, D. (2019). Examining the Usability of Content in Canvas: HTML vs. PDF.

Oche, A. J., Folashade, A. G., Ghosal, T., & Biswas, A. (2025). A Systematic Review of Key Retrieval-Augmented Generation (RAG) Systems: Progress, Gaps, and Future Directions (arXiv:2507.18910). arXiv. https://doi.org/10.48550/arXiv.2507.18910

OpenAI. (2024, September 12). Learning to reason with LLMs. https://openai.com/index/learning-to-reason-with-llms/

ppmb.upnjatim. (2025). Pusat Penerimaan Mahasiswa Baru. https://ppmb.upnjatim.ac.id/

Pressman, R. S., & Maxim, B. R. (2019). Software Engineering: A Practitioner’s Approach. McGraw-Hill Education.

Pulkundwar, P., Dhanawade, V., Yadav, R., Sonkar, M., Asurlekar, M., & Rathod, S. (2025). A Concise Review of Hallucinations in LLMs and their Mitigation (arXiv:2512.02527). arXiv. https://doi.org/10.48550/arXiv.2512.02527

Tensmeyer, C., Bylinski, Z., Cai, T., Miller, D., Nenkova, A., Niklaus, A., & Wallace, S. (2023). Web Table Formatting Affects Readability on Mobile Devices. Proceedings of the ACM Web Conference 2023, WWW ’23, 1334–1344. https://doi.org/10.1145/3543507.3583506

Tohir, H., Merlina, N., & Haris, M. (2024). Utilizing Retrieval-Augmented Generation in Large Language Models to Enhance Indonesian Language NLP. JITK (Jurnal Ilmu Pengetahuan Dan Teknologi Komputer), 10(2), 352–360. https://doi.org/10.33480/jitk.v10i2.5916

Undang-Undang Nomor 14 Tahun 2008 tentang Keterbukaan Informasi Publik (2008). Tambahan Lembaran Negara Nomor 4846

upnhumas. (2025, Agustus 16). UPN Veteran Jawa Timur Sambut 6.662 Mahasiswa Baru, Resmikan PKKMB 2025 di Menara Wimaya Twin Tower. UPN “Veteran” Jawa Timur. https://upnjatim.ac.id/2025/08/16/upn-veteran-jawa-timur-sambut-6-662-mahasiswa-baru-resmikan-pkkmb-2025-di-menara-wimaya-twin-tower/

Yin, S., Fu, C., Zhao, S., Li, K., Sun, X., Xu, T., & Chen, E. (2024). A survey on multimodal large language models. National Science Review, 11(12), nwae403. https://doi.org/10.1093/nsr/nwae403

Downloads

Published

2026-03-09