Beyond monolingual assumptions: a survey of code-switched NLP in the era of large language models

Heth, RajveeRajveeHethSinha, Samridhi RajSamridhi RajSinhaPatil, MahavirMahavirPatilBeniwal, HimanshuHimanshuBeniwalSingh, MayankMayankSingh2025-10-222025-10-222025-10-0110.48550/arXiv.2510.07037https://repository.iitgn.ac.in/handle/IITG2025/33392Code-switching (CSW), the alternation of languages and scripts within a single utterance, remains a fundamental challenge for multilingual NLP, even amidst the rapid advances of large language models (LLMs). Most LLMs still struggle with mixed-language inputs, limited CSW datasets, and evaluation biases, hindering deployment in multilingual societies. This survey provides the first comprehensive analysis of CSW-aware LLM research, reviewing 308 studies spanning five research areas, 12 NLP tasks, 30+ datasets, and 80+ languages. We classify recent advances by architecture, training strategy, and evaluation methodology, outlining how LLMs have reshaped CSW modeling and what challenges persist. The paper concludes with a roadmap emphasizing the need for inclusive datasets, fair evaluation, and linguistically grounded models to achieve truly multilingual intelligence.en-USBeyond monolingual assumptions: a survey of code-switched NLP in the era of large language modelse-Print