Information Extraction

2019 Language and Intelligence Challenge: Information Extraction

Prerequisites

Install required packages by:

pip install -r requirements.txt

Data

sample schema:

{"object_type": "地点", "predicate": "祖籍", "subject_type": "人物"}

sample data, with postag and text as input and spo_list as output:

{
    "postag": [
        {"word": "一直", "pos": "d"}, 
        {"word": "陪", "pos": "v"}, 
        {"word": "我", "pos": "r"}, 
        {"word": "到", "pos": "p"}, 
        {"word": "现在", "pos": "t"}, 
        {"word": "是", "pos": "v"}, 
        {"word": "歌手", "pos": "n"}, 
        {"word": "马健涛", "pos": "nr"}, 
        {"word": "原创", "pos": "v"}, 
        {"word": "的", "pos": "u"}, 
        {"word": "歌曲", "pos": "n"}
    ], 
    "text": "一直陪我到现在是歌手马健涛原创的歌曲", 
    "spo_list": [
        {"predicate": "歌手", "object_type": "人物", "subject_type": "歌曲", "object": "马健涛", "subject": "一直陪我到现在"}
    ]
}

Baseline

baidu/information-extraction

Idea

Train multi-label classification model: predict predicate.
Train sequence labeling model: input text and predicate, output text labeling.
Extract SPO from sequence labeling result.

Implementation

Check report/PRML-final-project-doc-2019.pdf for details.

Multi-label Classification

CNN, BiRNN, BiLSTM, BiLSTM with max pooling and RCNN
BERT

Sequence Labeling

Encoder: BiLSTM and Transformer
Decoder: CRF

Result

Multi-label Classification

Sequence Labeling

fitlog usage

Initialize fitlog in classification folder:

cd classification/
fitlog init
fitlog log logs

Initialize fitlog in labeling folder:

cd labeling/
fitlog init
fitlog log logs

Author

Zhongyu Chen

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
chinese_L-12_H-768_A-12		chinese_L-12_H-768_A-12
classification		classification
data		data
labeling		labeling
legacy		legacy
pic		pic
report		report
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
convert_tf_checkpoint_to_pytorch.py		convert_tf_checkpoint_to_pytorch.py
handout.pptx		handout.pptx
requirements.txt		requirements.txt
tf2pytorch_ckpt.sh		tf2pytorch_ckpt.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Information Extraction

Prerequisites

Data

Baseline

Idea

Implementation

Multi-label Classification

Sequence Labeling

Result

Multi-label Classification

Sequence Labeling

fitlog usage

Author

About

Releases

Packages

Languages

License

ysjiao/information-extraction

Folders and files

Latest commit

History

Repository files navigation

Information Extraction

Prerequisites

Data

Baseline

Idea

Implementation

Multi-label Classification

Sequence Labeling

Result

Multi-label Classification

Sequence Labeling

fitlog usage

Author

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages