23 Dec 21:42

cverluise

a2a5d7f

🏷️ v0.3.1 Latest

Latest

Data

Major improvement of intext.patent

Validation

Validation of the intext.patent table

Thanks

Special thanks to:

Gabriele Cristelli (EPFL)
Kyle Higham (Hitotsubashi University)
Lucas Violon (HEC Paris)

Assets 2

04 Nov 14:18

cverluise

0.3.0

5d087d9

0.3.0

🏷 `v0.3.0`

Data

Major improvement of bibliographical_reference schema (harmonize grobid & crossref) for seamless analysis
Enrichment of intext.patent
Add domain specific front page tables (norm_standard, database, wiki)

Community

Revisit BQ project architecture
Add Colab notebooks integration
Revisit README.md

Code

Lighter API
Lighter dependencies

Models

Add information extraction models
Add models and training data DVC support

Validation

Validation of in-text extraction models

Thanks

Special thanks to:

Gabriele Cristelli (EPFL)
Kyle Higham (Hitotsubashi University)
Lucas Violon (HEC Paris)

Assets 2

03 Mar 11:03

cverluise

v0.2-npl

1a12601

v0.2-npl

🏷 `v0.2-npl`

The v0.2-npl introduces 2 major improvements:

npl_class field. This field is predicted using a multi-class text classification model based on spaCy textCategorizer with the npl text as input. See focus and models binaries below.
Propagate ISSN using title_j to bibliographical references with the same title_j but no match.

Focus on `npl_class`

`en_core_web_sm_npl-class-ensemble-0.8`

ensemble model (bow+cnn with bagging)
trained on 80% of the "gold" dataset and evaluated on remaining 20% (hold-out)

See in models/npl_class_training/ for more

Average performance

accuracy	precision	recall	f1
0.9	0.89	0.88	0.88

Class performance

precision	recall	f1	support
BIBLIOGRAPHICAL_REFERENCE	0.92	0.95	0.93
SEARCH_REPORT	1.0	0.92	0.96
OFFICE_ACTION	0.99	0.93	0.96
DATABASE	0.89	0.73	0.8
WEBPAGE	0.53	0.53	0.53
PATENT	0.91	0.94	0.93
NA	1.0	1.0	1.0
PRODUCT_DOCUMENTATION	0.44	0.43	0.44
NORM_STANDARD	0.86	0.6	0.71
LITIGATION	0.25	0.11	0.15

`en_core_web_sm_npl-class-ensemble-1.0`

Same as en_core_web_sm_npl-class-ensemble-1.0 but trained on full dataset to maximize performance. Model used to create the npl_class field.

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data

Validation

Thanks

🏷 `v0.3.0`

Data

Community

Code

Models

Validation

Thanks

🏷 `v0.2-npl`

Focus on `npl_class`

`en_core_web_sm_npl-class-ensemble-0.8`

Average performance

Class performance

`en_core_web_sm_npl-class-ensemble-1.0`

Releases: cverluise/PatCit

🏷️ v0.3.1

Data

Validation

Thanks

0.3.0

🏷 v0.3.0

Data

Community

Code

Models

Validation

Thanks

v0.2-npl

🏷 v0.2-npl

Focus on npl_class

en_core_web_sm_npl-class-ensemble-0.8

Average performance

Class performance

en_core_web_sm_npl-class-ensemble-1.0

🏷 `v0.3.0`

🏷 `v0.2-npl`

Focus on `npl_class`

`en_core_web_sm_npl-class-ensemble-0.8`

`en_core_web_sm_npl-class-ensemble-1.0`