UNITYAI-GUARD: pioneering toxicity detection across low-resource Indian languages
Source
arXiv
Date Issued
2025-03-01
Author(s)
Beniwal, Himanshu
Venkat, Reddybathuni
Kumar, Rohit
Srivibhav, Birudugadda
Jain, Daksh
Doddi, Pavan
Dhande, Eshwar
Ananth, Adithya
Kuldeep
Kubadia, Heer
Sharda, Pratham
Abstract
This work introduces UnityAI-Guard, a framework for binary toxicity classification targeting low-resource Indian languages. While existing systems predominantly cater to high-resource languages, UnityAI-Guard addresses this critical gap by developing state-of-the-art models for identifying toxic content across diverse Brahmic/Indic scripts. Our approach achieves an impressive average F1-score of 84.23% across seven languages, leveraging a dataset of 888k training instances and 35k manually verified test instances. By advancing multilingual content moderation for linguistically diverse regions, UnityAI-Guard also provides public API access to foster broader adoption and application.
