HinGE: a dataset for generation and evaluation of code-mixed Hinglish text

Show simple item record

dc.contributor.author Srivastava, Vivek
dc.contributor.author Singh, Mayank
dc.date.accessioned 2012-09-26T07:22:34Z
dc.date.available 2012-09-26T07:22:34Z
dc.date.issued 2021-07
dc.identifier.citation Srivastava, Vivek and Singh, Mayank, "HinGE: a dataset for generation and evaluation of code-mixed Hinglish text", arXiv, Cornell University Library, DOI: arXiv:2107.03760, Jul. 2021. en_US
dc.identifier.uri http://arxiv.org/abs/2107.03760
dc.identifier.uri https://repository.iitgn.ac.in/handle/123456789/6731
dc.description.abstract Text generation is a highly active area of research in the computational linguistic community. The evaluation of the generated text is a challenging task and multiple theories and metrics have been proposed over the years. Unfortunately, text generation and evaluation are relatively understudied due to the scarcity of high-quality resources in code-mixed languages where the words and phrases from multiple languages are mixed in a single utterance of text and speech. To address this challenge, we present a corpus (HinGE) for a widely popular code-mixed language Hinglish (code-mixing of Hindi and English languages). HinGE has Hinglish sentences generated by humans as well as two rule-based algorithms corresponding to the parallel Hindi-English sentences. In addition, we demonstrate the inefficacy of widely-used evaluation metrics on the code-mixed data. The HinGE dataset will facilitate the progress of natural language generation research in code-mixed languages.
dc.description.statementofresponsibility by Vivek Srivastava and Mayank Singh
dc.language.iso en_US en_US
dc.publisher Cornell University Library en_US
dc.title HinGE: a dataset for generation and evaluation of code-mixed Hinglish text en_US
dc.type Pre-Print en_US
dc.relation.journal arXiv

Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search Digital Repository


My Account