TabLeX: a benchmark dataset for structure and content information extraction from scientific tables

Show simple item record

dc.contributor.author Desai, Harsh
dc.contributor.author Kayal, Pratik
dc.contributor.author Singh, Mayank
dc.date.accessioned 2021-05-27T13:33:04Z
dc.date.available 2021-05-27T13:33:04Z
dc.date.issued 2021-05
dc.identifier.citation Desai, Harsh; Kayal, Pratik and Singh, Mayank, "TabLeX: a benchmark dataset for structure and content information extraction from scientific tables", arXiv, Cornell University Library, DOI: arXiv:2105.06400, May 2021. en_US
dc.identifier.uri http://arxiv.org/abs/2105.06400
dc.identifier.uri https://repository.iitgn.ac.in/handle/123456789/6530
dc.description.abstract Information Extraction (IE) from the tables present in scientific articles is challenging due to complicated tabular representations and complex embedded text. This paper presents TabLeX, a large-scale benchmark dataset comprising table images generated from scientific articles. TabLeX consists of two subsets, one for table structure extraction and the other for table content extraction. Each table image is accompanied by its corresponding LATEX source code. To facilitate the development of robust table IE tools, TabLeX contains images in different aspect ratios and in a variety of fonts. Our analysis sheds light on the shortcomings of current state-of-the-art table extraction models and shows that they fail on even simple table images. Towards the end, we experiment with a transformer-based existing baseline to report performance scores. In contrast to the static benchmarks, we plan to augment this dataset with more complex and diverse tables at regular intervals.
dc.description.statementofresponsibility by Harsh Desai, Pratik Kayal and Mayank Singh
dc.language.iso en_US en_US
dc.publisher Cornell University en_US
dc.subject Information Extraction en_US
dc.subject LATEX en_US
dc.subject Scientific Articles en_US
dc.title TabLeX: a benchmark dataset for structure and content information extraction from scientific tables en_US
dc.type Pre-Print en_US
dc.relation.journal arXiv


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search Digital Repository


Browse

My Account