Repository logo
  • English
  • العربية
  • বাংলা
  • Català
  • Čeština
  • Deutsch
  • Ελληνικά
  • Español
  • Suomi
  • Français
  • Gàidhlig
  • हिंदी
  • Magyar
  • Italiano
  • Қазақ
  • Latviešu
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Srpski (lat)
  • Српски
  • Svenska
  • Türkçe
  • Yкраї́нська
  • Tiếng Việt
Log In
New user? Click here to register.Have you forgotten your password?
  1. Home
  2. IIT Gandhinagar
  3. Biological Sciences and Engineering
  4. BSE Publications
  5. Varanasi tourism in question answer system track: IIIT SURAT @ FIRE'25 shared task
 
  • Details

Varanasi tourism in question answer system track: IIIT SURAT @ FIRE'25 shared task

Source
17th meeting of Forum for Information Retrieval Evaluation (FIRE 2025)
Date Issued
2025-12-17
Author(s)
Kumar, Ritesh
Jaiswal, Sumit Chand
Bhatia, Dhiraj  
Abstract
This paper presents our approach to the VATIKA: Varanasi Tourism in Question Answering System Track at FIRE 2025, conducted by the Indian Institute of Information Technology Surat. The task focuses on developing a domain-specific Question Answering (QA) system for tourism-related queries in Hindi, particularly centered on the culturally significant city of Varanasi. To address this challenge, we propose a hybrid architecture that integrates semantic retrieval with extractive question answering. Our system leverages Facebook AI Similarity Search (FAISS) for efficient similarity search in high-dimensional vector spaces. Contextual embeddings are generated using IndicBERT, a multilingual ALBERT-based transformer model pretrained on major Indic languages. These embeddings are indexed within FAISS to enable fast and accurate retrieval of semantically relevant contexts for a given user query. The retrieved context is subsequently processed by a fine-tuned IndicBERT-based extractive QA model, which predicts the start and end token positions of the answer span within the passage. This two-stage retrieval and comprehension framework improves computational efficiency while maintaining contextual relevance. We submitted three system runs for the shared task. Although IndicBERT proved effective for both embedding generation and question answering, the overall performance was constrained by challenges in capturing nuanced linguistic characteristics of pure Hindi text, particularly domain-specific expressions and culturally grounded references. Our findings highlight the importance of domain adaptation and languagespecific fine-tuning for Hindi QA systems. Future improvements may include enhanced Hindi-specific pretraining, incorporation of linguistic features, and improved retrieval strategies to better address semantic variability in tourism-related queries
URI
https://repository.iitgn.ac.in/handle/IITG2025/34789
IITGN Knowledge Repository Developed and Managed by Library

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Privacy policy
  • End User Agreement
  • Send Feedback
Repository logo COAR Notify