Skip to content

πŸ† KISTI κ³Όν•™κΈ°μˆ  곡곡 AI 데이터 뢄석 ν™œμš© κ²½μ§„λŒ€νšŒ - 3rd place solution

Notifications You must be signed in to change notification settings

Han-YeJi/KISTI-sentence-tagging

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Development of domestic thesis sentence semantic tagging model

Introduction

  • To automate the meaning tagging of the domestic thesis sentence by predicting the rhetorical category of a thesis sentence.
  • Hierarchical embedding structure and multiple loss functions are used to represent the meaning of rhetorical categories.

Dataset description

There are a total of 155,740 thesis sentences and tag pairs, and the semantic tags form a hierarchical structure with semantic structure classification/detailed semantic classification.

Main strategy

  1. Constructed text representation for thesis sentences using KorSciBert and GCN.
  2. Label embedding is constructed to extract the label semantic representation.
  3. Multiple loss function was constructed to reflect hierarchical properties through label semantic distance.
    • Classification loss : We predicted labels using only text representation.
    • Join embedding loss : We minimized the distance between text semantics and target label semantics within the same embedding space.
    • Matching loss : We put distance between text semantics and incorrect label semantics.

Directory Structure

/root/workspace
β”œβ”€β”€ data
β”‚    β”œβ”€β”€ csv
β”‚    β”‚    β”œβ”€β”€ train.csv
β”‚    β”‚    β”œβ”€β”€ dec.csv
β”‚    β”‚    β”œβ”€β”€ test.csv
β”‚    β”‚    └── label_desc.csv
β”‚    β”œβ”€β”€ hierar
β”‚    β”‚    β”œβ”€β”€ hierar_prob.json
β”‚    β”‚    β”œβ”€β”€ hierar.txt
β”‚    β”‚    β”œβ”€β”€ label.dict
β”‚    β”‚    β”œβ”€β”€ label_i2v.pickle
β”‚    β”‚    └── label_v2i.pickle
β”‚    └── make_df.py 
β”‚
β”œβ”€β”€ src
β”‚    β”œβ”€β”€ models
β”‚    β”‚    β”œβ”€β”€ pretrained_model
β”‚    β”‚    β”‚    └── korscibert
β”‚    β”‚    β”‚         β”œβ”€β”€ bert_config_kisti.json
β”‚    β”‚    β”‚         β”œβ”€β”€ pytorch_model.bin
β”‚    β”‚    β”‚         β”œβ”€β”€ tokenization_kisti.py
β”‚    β”‚    β”‚         └── vocab_kisti.txt
β”‚    β”‚    β”‚   
β”‚    β”‚    β”œβ”€β”€ structure_model
β”‚    β”‚    β”‚    β”œβ”€β”€ graphcnn.py
β”‚    β”‚    β”‚    β”œβ”€β”€ structure_encoder.py
β”‚    β”‚    β”‚    └── tree.py
β”‚    β”‚    β”‚    
β”‚    β”‚    β”œβ”€β”€ matching_network.py
β”‚    β”‚    β”œβ”€β”€ model.py
β”‚    β”‚    └── text_feature_propagation.py
β”‚    β”‚   
β”‚    β”œβ”€β”€ utils
β”‚    β”‚    β”œβ”€β”€ configure.py
β”‚    β”‚    β”œβ”€β”€ evaluation_modules.py
β”‚    β”‚    β”œβ”€β”€ hierarchy_tree_stastistic.py
β”‚    β”‚    β”œβ”€β”€ train_modules.py
β”‚    β”‚    └── utils.py
β”‚    β”‚  
β”‚    β”œβ”€β”€ config.json
β”‚    β”œβ”€β”€ dataloader.py
β”‚    β”œβ”€β”€ main.py
β”‚    └── trainer.py
β”‚
└── sen_cls.yaml

How to Use

  1. Create Environment & Import Library
    conda env create -f sen_cls.yaml
    conda activate sen_cls
    pip install torch==1.8.0+cu111  -f https://download.pytorch.org/whl/torch_stable.html
    
  2. Training
    python main.py --do_train=True --exp_num='exp'
    
  3. Test
    python main.py --do_test=True --exp_num='exp0' 
    
  4. Predict
    python main.py --do_predict=True --exp_num='0'  
    

About

πŸ† KISTI κ³Όν•™κΈ°μˆ  곡곡 AI 데이터 뢄석 ν™œμš© κ²½μ§„λŒ€νšŒ - 3rd place solution

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages