2019 Language and Intelligence Challenge: Information Extraction
- Install required packages by:
pip install -r requirements.txt
- sample schema:
{"object_type": "地点", "predicate": "祖籍", "subject_type": "人物"}
- sample data, with
postag
andtext
as input andspo_list
as output:
{
"postag": [
{"word": "一直", "pos": "d"},
{"word": "陪", "pos": "v"},
{"word": "我", "pos": "r"},
{"word": "到", "pos": "p"},
{"word": "现在", "pos": "t"},
{"word": "是", "pos": "v"},
{"word": "歌手", "pos": "n"},
{"word": "马健涛", "pos": "nr"},
{"word": "原创", "pos": "v"},
{"word": "的", "pos": "u"},
{"word": "歌曲", "pos": "n"}
],
"text": "一直陪我到现在是歌手马健涛原创的歌曲",
"spo_list": [
{"predicate": "歌手", "object_type": "人物", "subject_type": "歌曲", "object": "马健涛", "subject": "一直陪我到现在"}
]
}
- Train multi-label classification model: predict predicate.
- Train sequence labeling model: input text and predicate, output text labeling.
- Extract SPO from sequence labeling result.
Check report/PRML-final-project-doc-2019.pdf
for details.
- CNN, BiRNN, BiLSTM, BiLSTM with max pooling and RCNN
- BERT
- Encoder: BiLSTM and Transformer
- Decoder: CRF
- Initialize fitlog in
classification
folder:
cd classification/
fitlog init
fitlog log logs
- Initialize fitlog in
labeling
folder:
cd labeling/
fitlog init
fitlog log logs
Zhongyu Chen