Amar Al Farizi (1), Primandani Arsi (2), Pungkas Subarkah (3)
General Background: The rapid adoption of artificial intelligence in smart tourism has increased the use of contextual chatbots to deliver destination information efficiently. Specific Background: However, tourism chatbots based on Large Language Models frequently encounter information hallucination, reducing reliability when handling dynamic and local tourism data. Knowledge Gap: Existing studies mainly focus on rule-based or single-model chatbot implementations and provide limited comparative evaluation of Retrieval Augmented Generation configurations combining embedding models and Large Language Models. Aims: This study aims to comparatively evaluate multiple Retrieval Augmented Generation configurations to identify the most suitable combination for contextual tourism chatbots and to analyze differences between large multilingual and small monolingual embedding models using a local tourism dataset. Results: Experimental evaluation using data from 49 tourist destinations in Banyumas Regency shows that the Multilingual-E5-Large embedding model consistently achieves perfect Precision, Recall, and F1-Score across all tested Large Language Models. The combination of Multilingual-E5-Large and GPT-4.1-Mini demonstrates the most balanced performance, achieving a BERTScore F1 of 0.7515 with an average response time of 1.555 seconds. Novelty: This research provides a systematic comparative assessment of embedding capacity and Large Language Model selection within a unified Retrieval Augmented Generation framework for tourism chatbots. Implications: The findings offer practical guidance for selecting model configurations that ensure accurate retrieval, high-quality responses, and efficient system performance in contextual tourism information services.
• Multilingual embedding models deliver consistently higher retrieval accuracy across all tested configurations• GPT-4.1-Mini produces the most balanced generative quality and response latency• Embedding model selection plays a more decisive role than language model variation
Retrieval Augmented Generation; Tourism Chatbot; Large Language Model; Embedding Model; Comparative Evaluation
S. Mammadova and L. Isayeva, “DIGITAL MARKETING OF TOURISM AS A TOOL FOR DESTINATION ECONOMIC DEVELOPMENT,” PAHTEI-Proceedings of Azerbaijan High Technical Educational Institutions, vol. 49, no. 02, pp. 623–634, Feb. 2025, doi: https://doi.org/10.5281/zenodo.14930999.
Aprili Vela, WA Armalia Reny, Alie Maria Septijantini, CN Yudhinanto, Hasbullah, and Oktaria Eka Travilta, “Pengaruh Digital Marketing Dan Experiential Marketing Terhadap Kepuasan Dan Loyalitas Pengguna Jasa Travel Pariwisata Rakata Tour Indonesia,” COSTING, vol. 7, no. 5, pp. 5060–5071, Oct. 2024, doi: https://doi.org/10.31539/costing.v7i6.12572.
S. Febbi Handayani and D. Intan Af, “Assessment of Retrieval and Generative Chatbots in Tourism Information Service,” 2025.
H. Sunarto, “Strategi Branding Pengembangan Industri Pariwisata 4.0 melalui Kompetitif Multimedia di Era Digital,” Journal of Tourism and Creativity, vol. 4, no. 1, 2020.
G. M. Majid, I. Tussyadiah, and Y. R. Kim, “Exploring the Potential of Chatbots in Extending Tourists’ Sustainable Travel Practices,” J Travel Res, vol. 64, no. 6, pp. 1292–1317, Jul. 2025, doi: 10.1177/00472875241247316.
A Rahman, Mamun Abdullah Al, and Islam Alma, “Programming challenges of Chatbot: Current and Future Prospective,” IEEE, 2017, doi: https://doi.org/10.1109/R10-HTC.2017.8288910.
P. Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” p. 1, Apr. 2021, doi: https://doi.org/10.48550/arXiv.2005.11401.
J. Lála, O. O’Donoghue, A. Shtedritski, S. Cox, S. G. Rodriques, and A. D. White, “PaperQA: Retrieval-Augmented Generative Agent for Scientific Research,” Dec. 2023, [Online]. Available: http://arxiv.org/abs/2312.07559
M. Kulkarni, P. Tangarajan, K. Kim, and A. Trivedi, “Reinforcement Learning for Optimizing RAG for Domain Chatbots,” Jan. 2024, [Online]. Available: http://arxiv.org/abs/2401.06800
A. D. Ferdian and S. N. Anwar, “Pengembangan Chatbot untuk Informasi Wisata Interaktif di Tangerang Selatan menggunakan Framework Rasa,” Jurnal Teknologi Dan Sistem Informasi Bisnis, vol. 5, no. 4, pp. 476–483, Oct. 2023, doi: 10.47233/jteksis.v5i4.953.
Raya Rizky Dharma Andika and Cahyanti F Lia Dwi, “Perancangan Sistem Informasi Chatbot Retrieval Augmented Generation Berbasis Website Pada PT. Revolusi Cita Edukasi,” Journal Computer Science, vol. 4, no. 1, pp. 15–21, 2025, doi: https://doi.org/10.31294/m75d4782.
Y. H. Ke et al., “Development and Testing of Retrieval Augmented Generation,” 2024.
T. Wang, J. He, and C. Xiong, “RAGVIZ: Diagnose and Visualize Retrieval-Augmented Generation,” Association for Computational Linguistics, 2024, pp. 320–327. [Online]. Available: https://youtu.be/cTAbuTu6ur4.
Z. Li, C. Li, M. Zhang, Q. Mei, and M. Bendersky, “Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach,” Oct. 2024, [Online]. Available: http://arxiv.org/abs/2407.16833
H. Lin, S. Zhan, J. Su, H. Zheng, and H. Wang, “IRSC: A Zero-shot Evaluation Benchmark for Information Retrieval through Semantic Comprehension in Retrieval-Augmented Generation Scenarios,” Sep. 2024, [Online]. Available: http://arxiv.org/abs/2409.15763
K. Enevoldsen et al., “MMTEB: Massive Multilingual Text Embedding Benchmark,” Jun. 2025, [Online]. Available: http://arxiv.org/abs/2502.13595
Chaoyu Yang, “Building RAG Systems with Open-Source and Custom AI Models,” https://www.bentoml.com/blog/building-rag-with-open-source-and-custom-ai-models.
P. Zhao et al., “Retrieval-Augmented Generation for AI-Generated Content: A Survey,” Jun. 2024, [Online]. Available: http://arxiv.org/abs/2402.19473
A. Rao, H. Alipour, and N. Pendar, “Rethinking Hybrid Retrieval: When Small Embeddings and LLM Re-ranking Beat Bigger Models,” May 2025, [Online]. Available: http://arxiv.org/abs/2506.00049