Awesome-Table-Recognition

A curated list of resources dedicated to table recognition

1. Papers

*CODE means official code and CODE means not official code

Conf.	Date	Title	Highlight	code
IJCAI	2023	Divide Rows and Conquer Cells: Towards Structure Recognition for Large Tables	Sequence	No
AAAI	2022	LORE: Logical Location Regression Network for Table Structure Recognition	Detection	*CODE
ACM-MM	2022	TSRFormer: Table Structure Recognition with Transformers	Detection	No
CVPR	2022	TableFormer: Table Structure Understanding with Transformers.	Sequence	No
CVPR	2022	Neural Collaborative Graph Machines for Table Structure Recognition	GNN	No
CVPR	2022	PubTables-1M: Towards comprehensive table extraction from unstructured documents	Dataset	*CODE
arXiv	2021/5/23	Multi-Type-TD-TSR -- Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition: from OCR to Structured Table Representations	Others	*CODE
ACM-MM	2021	Show, Read and Reason: Table Structure Recognition with Flexible Context Aggregator	GNN	No
ICCV	2021	Parsing Table Structures in the Wild	Detection	No
ICCV	2021	TGRNet: A Table Graph Reconstruction Network for Table Structure Recognition	GNN	*CODE
ICDAR Competition	2021	ICDAR 2021 Competition on Scientific Literature Parsing	Dataset	*CODE
ICDAR Competition	2021	PingAn-VCGroup's Solution for ICDAR 2021 Competition on Scientific Literature Parsing Task B: Table Recognition to HTML	Sequence	*CODE
ICDAR Competition	2021	LGPMA: Complicated Table Structure Recognition with Local and Global Pyramid Mask Alignment	Others	*CODE
WACV	2021	Global table extractor (gte): A framework for joint table identification and cell structure recognition using visual context	Others	No
CVPR Workshop	2020	CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents	Others	*CODE
ECCV	2020	Image-based table recognition: data, model, and evaluation	Dataset	*CODE
ECCV	2020	Table structure recognition using top-down and bottom-up cues	Others	*CODE
LREC	2020	TableBank: A Benchmark Dataset for Table Detection and Recognition	Dataset	*CODE
arXiv	2019/8/28	Complicated table structure recognition	Others	*CODE
ICDAR	2019	Rethinking Table Recognition using Graph Neural Networks	GNN	*CODE
ICDAR	2019	Tablenet: Deep learning model for end-to-end table detection and tabular data extraction from scanned document images	Others	No
ICDAR	2019	Res2tim: Reconstruct syntactic structures from table images.	Others	*CODE
ICDAR	2017	Deepdesrt: Deep learning for detection and structure recognition of tables in document images	Others	No

2. Datasets

2.1 Introduction

Dataset	Description	dataset link
TableBank	English TableBank is a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on the internet, contains 417K high-quality labeled tables.It only contain cell Topology groudtruth	TableBank
SciTSR	*English SciTSR is a large-scale table structure recognition dataset, which contains 15,000 tables in PDF format and their corresponding structure labels obtained from LaTeX source files.It contain cell Topology, cell content groudtruth	SciTSR
PubTabNet	English PubTabNet is a large dataset for image-based table recognition, containing 568k+ images of tabular data annotated with the corresponding HTML representation of the tables.It contain cell Topology, cell content and non-blank cell location groudtruth	PubTabNet
FinTabNet	English This dataset contains complex tables from the annual reports of S&P 500 companies with detailed table structure annotations to help train and test structure recognition.	FinTabNet
PubTables-1M	English A large, detailed, high-quality dataset for training and evaluating a wide variety of models for the tasks of table detection, table structure recognition, and functional analysis.	PubTables-1M
WTW	English and Chinese WTW-Dataset is the first wild table dataset for table detection and table structure recongnition tasks, which is constructed from photoing, scanning and web pages, covers 7 challenging cases like: (1)Inclined tables, (2) Curved tables, (3) Occluded tables or blurredtables (4) Extreme aspect ratio tables (5) Overlaid tables, (6) Multi-color tables and (7) Irregular tables in table structure recognition.It contain cell Topology, all cell location groudtruth	WTW
TNCR	English a new table dataset with varying image quality collected from open access websites.TNCR contains 9428 labeled tables with approximately 6621 images.their classification into 5 different classes(Full Lined,Merged Cells,No lines,Partial Lined,Partial Lined Merged Cells).	TNCR
TAL_OCR_TABLE	Chinese TAL_OCR_TABLE dataset come from TAL Form Recognition Technology Challenge.The data of comes from the real homework of students in the education scene and the scene of the test paper. It contain 16k train image and 4k test imageIt contain cell Topology, cell content and all cell location groudtruth	TAL_OCR_TABLE
SynthTabNet	English SynthTabNet is a synthetically generated dataset that contains annotated images of data in tabular layouts. It contain 600k train image, All parts are divided into Train, Test and Val splits (80%, 10%, 10%). It contain cell Topology, cell content and all cell location groudtruth	SynthTabNet

2.2 Comparison of datasets for table structure recognition.

Dataset	Cell Topology	Cell content	Cell Location	Table Location
TableBank	✓	✕	✕	✓
SciTSR	✓	✓	✕	✓
PubTabNet	✓	✓	✓^†	✓
FinTabNet	✓	✓	✓^†	✓
PubTables-1M	✓	✓	✓	✓
WTW	✓	✕	✓	✓
TNCR	✕	✕	✕	✓
TAL_OCR_TABLE	✓	✓	✓	✓
SynthTabNet	✓	✓	✓	✓

^† For these datasets, cell bounding boxes are given for non-blank cells only and exclude any non-text portion of a cell.

3. Other technical solutions

PRCV2021 Table Recognition Technology Challenge

Competition First Place Solution
- Solution Introduction
- Solution Report PPT
Competition Second Place Method
- Solution Report PPT
Competition Third Place Method
- Solution Report PPT

ICDAR 2021 Competition on Scientfic Literature Parsing TaskB: Table Recognition to HTML

Competition First Place Solution
- Solution Report PPT
- Solution Report Video
Competition Second Place Method
- Solution Report PPT
- Solution Report Video

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-Table-Recognition

1. Papers

2. Datasets

2.1 Introduction

2.2 Comparison of datasets for table structure recognition.

3. Other technical solutions

PRCV2021 Table Recognition Technology Challenge

ICDAR 2021 Competition on Scientfic Literature Parsing TaskB: Table Recognition to HTML

About

Releases

Packages

cv-small-snails/Awesome-Table-Recognition

Folders and files

Latest commit

History

Repository files navigation

Awesome-Table-Recognition

1. Papers

2. Datasets

2.1 Introduction

2.2 Comparison of datasets for table structure recognition.

3. Other technical solutions

PRCV2021 Table Recognition Technology Challenge

ICDAR 2021 Competition on Scientfic Literature Parsing TaskB: Table Recognition to HTML

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages