Varanasi tourism in question answer system track: IIIT SURAT @ FIRE'25 shared task

Bhatia, Dhiraj

Varanasi tourism in question answer system track: IIIT SURAT @ FIRE'25 shared task

Source

17th meeting of Forum for Information Retrieval Evaluation (FIRE 2025)

Date Issued

2025-12-17

Author(s)

Kumar, Ritesh

Jaiswal, Sumit Chand

Bhatia, Dhiraj

Abstract

This paper presents our approach to the VATIKA: Varanasi Tourism in Question Answering System Track at FIRE 2025, conducted by the Indian Institute of Information Technology Surat. The task focuses on developing a domain-specific Question Answering (QA) system for tourism-related queries in Hindi, particularly centered on the culturally significant city of Varanasi. To address this challenge, we propose a hybrid architecture that integrates semantic retrieval with extractive question answering. Our system leverages Facebook AI Similarity Search (FAISS) for efficient similarity search in high-dimensional vector spaces. Contextual embeddings are generated using IndicBERT, a multilingual ALBERT-based transformer model pretrained on major Indic languages. These embeddings are indexed within FAISS to enable fast and accurate retrieval of semantically relevant contexts for a given user query. The retrieved context is subsequently processed by a fine-tuned IndicBERT-based extractive QA model, which predicts the start and end token positions of the answer span within the passage. This two-stage retrieval and comprehension framework improves computational efficiency while maintaining contextual relevance. We submitted three system runs for the shared task. Although IndicBERT proved effective for both embedding generation and question answering, the overall performance was constrained by challenges in capturing nuanced linguistic characteristics of pure Hindi text, particularly domain-specific expressions and culturally grounded references. Our findings highlight the importance of domain adaptation and languagespecific fine-tuning for Hindi QA systems. Future improvements may include enhanced Hindi-specific pretraining, incorporation of linguistic features, and improved retrieval strategies to better address semantic variability in tourism-related queries

URI

https://repository.iitgn.ac.in/handle/IITG2025/34789