RANCANG BANGUN QUESTION ANSWERING SYSTEM LAYANAN INFORMASI AKADEMIK BERBASIS RAG
Keywords:
Information Retrieval, Prototype, RAG Architecture, Software Engineering, System ImplementationAbstract
The complexity of academic information in the New Student Admissions (PPMB) process often overwhelms conventional helpdesk services. This study aims to design and build a prototype Question Answering (QA) System based on Retrieval-Augmented Generation (RAG) that can automate information services accurately. The system is built using a three-layer architecture: Presentation Layer (Gradio UI), Application Layer (Python/LangChain), and Data Layer (ChromaDB). A key focus of the development is the data pipeline strategy, specifically handling "Indivisible Information Units" in PDF tables by setting a dynamic chunking limit of 3000 tokens. The prototype features a Knowledge Base Manager for dynamic document updates and a multilingual Chat Interface. Testing demonstrates the system's ability to process heterogeneous data from 30 sources and successfully retrieve specific procedural information, such as the "Golden Ticket" requirements, with high precision. The system is deployed using a reasoning model engine to ensure logical answer synthesis.
Downloads
References
Aunul, S., Handayani, F., & Riswandi, R. (2022). Uncertainty Reduction of First-Year College Students in Virtual Class. CHANNEL: Jurnal Komunikasi, 10(1), 21–26. https://doi.org/10.12928/channel.v10i1.22088
Bauer-Wolf, J. (2023, Agustus 25). Over half of students rank college applications as their most stressful academic experience, survey finds | Higher Ed Dive. https://www.highereddive.com/news/over-half-of-students-rank-college-applications-as-their-most-stressful-aca/691808/
Donghun Shin, Xigui Li, Li, H., Shaojie Shi, Kaitao Chen, & Daocheng Fu. (2024). Prompt Engineering and Format on LLMs in the Financial Domain. https://doi.org/10.13140/RG.2.2.17057.11365
Firdaus, D., Sumardi, I., & Kulsum, Y. (2024). Integrating Retrieval-Augmented Generation with Large Language Model Mistral 7b for Indonesian Medical Herb. JISKA (Jurnal Informatika Sunan Kalijaga), 9(3), Article 3. https://doi.org/10.14421/jiska.2024.9.3.230-243
Gupta, S., Ranjan, R., & Singh, S. N. (2024). A Comprehensive Survey of Retrieval-Augmented Generation (RAG): Evolution, Current Landscape and Future Directions (arXiv:2410.12837). arXiv. https://doi.org/10.48550/arXiv.2410.12837
Guu, K., Lee, K., Tung, Z., Pasupat, P., & Chang, M.-W. (2020). REALM: Retrieval-augmented language model pre-training. Proceedings of the 37th International Conference on Machine Learning, ICML’20, 119, 3929–3938. https://dl.acm.org/doi/10.5555/3524938.3525306
Jongbloed, B., Vossensteyn, H., van Vught, F., & Westerheijden, D. F. (2018). Transparency in Higher Education: The Emergence of a New Perspective on Higher Education Governance. Dalam A. Curaj, L. Deca, & R. Pricopie (Ed.), European Higher Education Area: The Impact of Past and Future Policies (hlm. 441–454). Springer International Publishing. https://doi.org/10.1007/978-3-319-77407-7_27
Kane, P. (2025, Februari 5). Access the latest 2.0 experimental models in the Gemini app. Google. https://blog.google/feed/gemini-app-experimental-models/
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems, 33, 9459–9474. https://proceedings.neurips.cc/paper_files/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html
Mahdavi, N., Tapak, L., Darvishi, E., Doosti-Irani, A., & Shafiee Motlagh, M. (2024). Unraveling the interplay between mental workload, occupational fatigue, physiological responses and cognitive performance in office workers. Scientific Reports, 14, 17866. https://doi.org/10.1038/s41598-024-68889-4
Ni, B., Liu, Z., Wang, L., Lei, Y., Zhao, Y., Cheng, X., Zeng, Q., Dong, L., Xia, Y., Kenthapadi, K., Rossi, R., Dernoncourt, F., Tanjim, M. M., Ahmed, N., Liu, X., Fan, W., Blasch, E., Wang, Y., Jiang, M., & Derr, T. (2025). Towards Trustworthy Retrieval Augmented Generation for Large Language Models: A Survey (arXiv:2502.06872). arXiv. https://doi.org/10.48550/arXiv.2502.06872
Noyes, D. (2019). Examining the Usability of Content in Canvas: HTML vs. PDF.
Oche, A. J., Folashade, A. G., Ghosal, T., & Biswas, A. (2025). A Systematic Review of Key Retrieval-Augmented Generation (RAG) Systems: Progress, Gaps, and Future Directions (arXiv:2507.18910). arXiv. https://doi.org/10.48550/arXiv.2507.18910
OpenAI. (2024, September 12). Learning to reason with LLMs. https://openai.com/index/learning-to-reason-with-llms/
ppmb.upnjatim. (2025). Pusat Penerimaan Mahasiswa Baru. https://ppmb.upnjatim.ac.id/
Pressman, R. S., & Maxim, B. R. (2019). Software Engineering: A Practitioner’s Approach. McGraw-Hill Education.
Pulkundwar, P., Dhanawade, V., Yadav, R., Sonkar, M., Asurlekar, M., & Rathod, S. (2025). A Concise Review of Hallucinations in LLMs and their Mitigation (arXiv:2512.02527). arXiv. https://doi.org/10.48550/arXiv.2512.02527
Tensmeyer, C., Bylinski, Z., Cai, T., Miller, D., Nenkova, A., Niklaus, A., & Wallace, S. (2023). Web Table Formatting Affects Readability on Mobile Devices. Proceedings of the ACM Web Conference 2023, WWW ’23, 1334–1344. https://doi.org/10.1145/3543507.3583506
Tohir, H., Merlina, N., & Haris, M. (2024). Utilizing Retrieval-Augmented Generation in Large Language Models to Enhance Indonesian Language NLP. JITK (Jurnal Ilmu Pengetahuan Dan Teknologi Komputer), 10(2), 352–360. https://doi.org/10.33480/jitk.v10i2.5916
Undang-Undang Nomor 14 Tahun 2008 tentang Keterbukaan Informasi Publik (2008). Tambahan Lembaran Negara Nomor 4846
upnhumas. (2025, Agustus 16). UPN Veteran Jawa Timur Sambut 6.662 Mahasiswa Baru, Resmikan PKKMB 2025 di Menara Wimaya Twin Tower. UPN “Veteran” Jawa Timur. https://upnjatim.ac.id/2025/08/16/upn-veteran-jawa-timur-sambut-6-662-mahasiswa-baru-resmikan-pkkmb-2025-di-menara-wimaya-twin-tower/
Yin, S., Fu, C., Zhao, S., Li, K., Sun, X., Xu, T., & Chen, E. (2024). A survey on multimodal large language models. National Science Review, 11(12), nwae403. https://doi.org/10.1093/nsr/nwae403





