Skip to content

Tabular Document Analysis Research Group at ISDCT SB RAS

Notifications You must be signed in to change notification settings

tabbydoc/tabbydoc.github.io

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 

Repository files navigation

TabbyDOC

Table Understanding Research

About

This research project aims at developing methods and software for the extraction of entities and their relationships from tables represented in unstructured and semi-structured data formats



This work was supported by the Russian Science Foundation (grant no. 18-71-10001). Our prior works were supported by the Russian Foundation for Basic Research (grant no. 12-07-31051 and grant no. 15-37-20042) and the Council for grants of the President of the Russian Federation (Scholarship No. SP-3387.2013.5)

Source code

  • tabbypdf, Rule-based PDF table extraction
  • tabbypdf2, Deep-learning-based PDF table extraction
  • tabbyxl, Rule-based spreadsheet data extraction
  • tabbyld, Semantic table interpretation using open knowledge graphs

Publications

2022

  • Shigarov A. (2022). Table understanding: Problem overview. WIREs Data Mining and Knowledge Discovery, 13(1), e1482. https://doi.org/10.1002/widm.1482

  • Kostyleva O., Paramonov V., Shigarov A., Vetrova V. (2022). Towards comparison of table type taxonomies. 45th Jubilee Int. Conv. on Information, Communication and Electronic Technology (MIPRO), 1461-1465. https://doi.org/10.23919/MIPRO55190.2022.9803520.

2021

  • Dorodnykh N., Yurin A., Shigarov A., Turdakov D. (2021). Ontology engineering at the assertion level based on semantic annotation of tabular data. 2021 Ivannikov Memorial Workshop (IVMEM). 28-34. https://doi.org/10.1109/IVMEM53963.2021.00011

  • Yurin A., Dorodnykh N., Shigarov A. (2021). Semi-automated formalization and representation of the engineering knowledge extracted from spreadsheet data. IEEE Access. 9, 157468-157481. https://doi.org/10.1109/ACCESS.2021.3130172

  • Paramonov V., Shigarov A., Vetrova V. (2021). Rule-driven spreadsheet data extraction from statistical tables: case study. Information and Software Technologies. ICIST 2021. CCIS 1486, 84-95. https://doi.org/10.1007/978-3-030-88304-1_7

  • Dorodnykh N., Yurin A. (2021). TabbyLD: a tool for semantic interpretation of spreadsheets data. Modelling and Development of Intelligent Systems. MDIS 2020. CCIS 1341, 315-333. https://doi.org/10.1007/978-3-030-68527-0_20

  • Dorodnykh N., Shigarov A., Yurin A. (2022). Using the semantic annotation of web table data for knowledge base construction. Proc. 4th Artificial Intelligence and Cloud Computing Conference. AICCC'21, 122-129. https://doi.org/10.1145/3508259.3508277

  • Dorodnykh N., Yurin A. (2022). Extraction of facts from web-tables based on semantic interpretation tabular data. 2022 Ivannikov Memorial Workshop (IVMEM), 7-17. https://doi.org/10.1109/IVMEM57067.2022.9983959

  • Mikhailov A., Shigarov A. Page layout analysis for refining table extraction from PDF documents. 2021 Ivannikov Ispras Open Conference (ISPRAS), 114-119. https://doi.org/10.1109/ISPRAS53967.2021.00021

2020

  • Mikhailov A., Shigarov A., Rozhkov E., Cherepanov I. (2020). On graph-based verification for PDF table detection. 2020 Ivannikov ISPRAS Open Conference (ISPRAS). 91-95. https://doi.org/10.1109/ISPRAS51486.2020.00020

  • Cherepanov I., Mikhailov A., Shigarov A., Paramonov V. (2020). On automated workflow for fine-tuning deep neural network models for table detection in document images. 2020 43rd International Convention on Information, Communication and Electronic Technology. 1130-1133. https://doi.org/10.23919/MIPRO48935.2020.9245241

  • Dorodnykh N. & Yurin A. (2020). Towards a universal approach for semantic interpretation of spreadsheets data. Proc. 24th Symposium on International Database Engineering & Applications. Article 22, 1-9. https://doi.org/10.1145/3410566.3410609

  • Paramonov V., Shigarov A., Vetrova V. (2020). Table header correction algorithm based on heuristics for improving spreadsheet data extraction. Information and Software Technologies. 1283 CCIS, 147-158. https://doi.org/10.1007/978-3-030-59506-7_13

  • Yurin A. & Dorodnykh N. (2020). Experimental evaluation of a spreadsheets transformation in the context of domain model engineering. Ural S. Biomedical Engineering, Radioelectronics and Information Technology. 0388-0391. https://doi.org/10.1109/USBEREIT48449.2020.9117674

  • Dorodnykh N., Yurin A., Shigarov A. (2020). Conceptual model engineering for industrial safety inspection based on spreadsheet data analysis. Modelling and Development of Intelligent Systems. 1126 CCIS, 51-65. https://doi.org/10.1007/978-3-030-39237-6_4

  • Paramonov, V., Shigarov, A., Vetrova, V., Mikhailov, A. (2020). Heuristic algorithm for recovering a physical structure of spreadsheet header. Information Systems Architecture and Technology. 1050 AISC, 140-149. https://doi.org/10.1007/978-3-030-30440-9_14

2019

  • Yurin A. & Dorodnykh N. (2019). A reverse engineering process for inferring conceptual models from canonicalized tables. 2019 Int. Multi-Conf. on Engineering, Computer and Information Sciences (SIBIRCON). 0485-0490. https://doi.org/10.1109/SIBIRCON48586.2019.8958458

  • Shigarov, A., Khristyuk, V., Mikhailov, A., Paramonov, V. (2019). TabbyXL: rule-based spreadsheet data extraction and transformation. Information and Software Technologies. 1078 CCIS, 59-75. https://doi.org/10.1007/978-3-030-30275-7_6
    Preprint
    Presentation

  • Shigarov, A., Khristyuk, V., Mikhailov, A. (2019). TabbyXL: software platform for rule-based spreadsheet data extraction and transformation. SoftwareX, 10. https://doi.org/10.1016/j.softx.2019.100270
    Preprint

  • Shigarov, A., Cherepanov, I., Cherkashin, E., Dorodnykh, N., Khristyuk, V., Mikhailov, A., Paramonov, V., Rozhkow, E., Yurin A. (2019). Towards end-to-end transformation of arbitrary tables from untagged portable documents (PDF) to linked data. CEUR-WS Proc. 2463, 1-12.
    Article

  • Shigarov, A., Khristyuk, V., Mikhailov, A., Paramonov, V. (2019). Software development for rule-based spreadsheet data extraction and transformation. Proc. 42nd Int. Convention on Information and Communication Technology, Electronics and Microelectronics. 1132-1137. https://doi.org/10.23919/MIPRO.2019.8756829
    Preprint

  • Cherkashin, E., Shigarov, A., Paramonov, V., Mikhailov, A. (2019). Digital archives supporting document content inference. Proc. 42nd Int. Convention on Information and Communication Technology, Electronics and Microelectronics. 1037-1042. https://doi.org/10.23919/MIPRO.2019.8757196
    Preprint

  • Dorodnykh, N., Yurin, A. (2019). Towards ontology engineering based on transformation of conceptual models and spreadsheet data: a case study. Intelligent Systems Applications in Software Engineering. 1046 AISC, 233-247. https://doi.org/10.1007/978-3-030-30329-7_22
    Preprint

  • Paramonov, V., Shigarov, A., Ruzhnikov, G., Cherkashin, E. (2019). Phonetic string matching for languages with Cyrillic alphabet. Information Systems Architecture and Technology. 852 AISC, 301-311. https://doi.org/10.1007/978-3-319-99981-4_28
    Preprint

2018

  • Shigarov, A., Altaev, A., Mikhailov, A., Paramonov, V., Cherkashin, E. (2018). TabbyPDF: web-based system for PDF table extraction. Information and Software Technologies. 920 CCIS, 257-269. https://doi.org/10.1007/978-3-319-99972-2_20
    Preprint

  • Yang, S., Wei, R., Shigarov, A. (2018). Semantic interoperability for electronic business through a novel cross-context semantic document exchange approach. Proc. 18th ACM Symposium on Document Engineering. 28:1-28:10. https://doi.org/10.1145/3209280.3209523

  • Cherkashin, E., Kopaygorodsky, A., Kazi, L., Shigarov, A., Paramonov, V. (2018). Model driven architecture implementation using linked data. Information and Software Technologies. 920 CCIS, 412-423. https://doi.org/10.1007/978-3-319-99972-2_34
    Preprint

2017

2016

  • Shigarov, A., Mikhailov, A., Altaev, A. (2016). Configurable table structure recognition in untagged PDF documents. Proc. 16th ACM Symposium on Document Engineering. 119-122. https://doi.org/10.1145/2960811.2967152
    Preprint
    Poster

  • Shigarov, A., Paramonov, V., Belykh, P., Bondarev, A. (2016). Rule-based canonicalization of arbitrary tables in spreadsheets. Information and Software Technologies. 639 CCIS, 78-91. https://doi.org/10.1007/978-3-319-46254-7_7
    Preprint

  • Paramonov, V., Shigarov, A., Ruzhnikov, G., Belykh, P. (2016). Polyphon: an algorithm for phonetic string matching in Russian language. Information and Software Technologies. 639 CCIS, 568-579. https://doi.org/10.1007/978-3-319-46254-7_46
    Preprint

  • Шигаров, А. (2016). Методологическое и программное обеспечение трансформации табличных данных от произвольной к реляционной форме. Научная секция заседания Объединенного ученного совета СО РАН по нанотехнологиям и информационным технологиям.
    Presentation

2015

  • Shigarov, A. (2015). Table understanding using a rule engine. Expert Systems with Applications. 42(2), 929-937. https://doi.org/10.1016/j.eswa.2014.08.045
    Preprint
    Presentation

  • Shigarov, A. (2015). Rule-based table analysis and interpretation. Information and Software Technologies. 538 CCIS, 175-186. https://doi.org/10.1007/978-3-319-24770-0_16
    Preprint

  • Шигаров, А. О., Бычков, И. В., Парамонов, В. В., Белых, П. В. (2015). Анализ и интерпретация произвольных таблиц на основе исполнения CRL-правил. Вычислительные технологии. 20(6), 87-112.
    Preprint

  • Shigarov, A., Paramonov, V. (2015). CRL: a rule language for analysis and interpretation of arbitrary tables. CEUR-WS Proc. 1536, 22-29.
    Article
    Presentation

2014

  • Шигаров, А. О. (2014). Восстановление логической структуры таблиц из неструктурированных текстов на основе логического вывода. Вычислительные технологии. 19(1), 87-99.
    Preprint

  • Shigarov, A. (2014). Automated table understanding using a rule engine. CEUR-WS Proc. 1297, 216-223.
    Article

2013

  • Шигаров, А. О., Бычков, И. В., Ружников, Г. М., Хмельнов, А. Е., Федоров, Р. К. (2013). Система трансформации таблиц. Информационные технологии и вычислительные системы. 3, 15-26.
    Preprint

2011

2009

  • Шигаров, А. О. (2009). Технология извлечения табличной информации из электронных документов разных форматов. Дис. канд. техн. наук.
    PhD Thesis
    PhD Abstract
    Presentation

  • Shigarov, A., Bychkov, I., Hmelnov, A., Ruzhnikov, G. (2009). A method for table detection in metafiles. Pattern Recognition and Image Analysis. 19(4), 693-697. https://doi.org/10.1134/S1054661809040191
    Preprint
    Poster

  • Бычков, И. В., Ружников, Г. М., Хмельнов, А. Е., Шигаров, А. О. (2009). Эвристический метод обнаружения таблиц в разноформатных документах. Вычислительные технологии. 14(2), 58-73.
    Preprint

2008

  • Хмельнов, А. Е., Шигаров, А. О. (2008). Метод извлечения таблиц из неформатированного текста. Вычислительные технологии. 13(1), 93-101.
    Preprint

Contacts

Office 222, Block EVM, Lermontov st. 134, Irkutsk, Russia, 664033 Department of Information Technology and Systems, Matrosov Institute for System Dynamics and Control Theory, Siberian Branch of the Russian Academy of Sciences

Alexey Shigarov (e-mail: shigarov@gmail.com)

About

Tabular Document Analysis Research Group at ISDCT SB RAS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published