Usage

This folder contains examples for finetuning our pretrained model CINO with Korean dataset - YNAT.

Requirements

numpy : 1.21.2
python : 3.7.10
pytorch : 1.7.1
scikit-learn : 0.24.2
transformers : 3.1.0

Steps

In this tutorial, we will finetune CINO-large with Korean dataset - YNAT.

project-dir：working directory
data-dir：data directory, here we set as ${project-dir}/data/
model_pretrain_dir：pretrained model directory, here we set as ${project-dir}/model/
model_save_dir：the directory where the best model to be saved, here we set as ${project-dir}/saved_models/
best_model_save_name：the filename of the best model, here we set as best_cino.pth

Step 1：Model preparation

Download CINO model from Download section, and unzip it into ${project-dir}/model/ . The folder should contain 3 files, including pytorch_model.bin, sentencepiece.bpe.model, config.json.

Step 2：Data Preparation

Download data from Korean Text Classification (YNAT) section, put them into ${data-dir}, extract the titles and corresponding categories and store them in the format of dataset example, and then rename the train set and dev set as train.txt and dev.txt respectively. For the mapping between names of classes and the indices, please refer to ${class_names} in the next step.

Step 3：Run command

python ynat_finetune.py --params cino-params.json

params should be a JSON dictionary, in this tutorial, cino-params.json contains all parameters for finetuning, for example：

{
    "learning_rate":5e-6,
    "epoch":5,
    "gradient_acc":4,
    "batch_size":16,
    "max_len":512,
    "weight_decay":1e-4,
    "warmup_rate":0.1,
    "data_dir":"data/",
    "model_pretrain_dir":"model/", 
    "model_save_dir":"saved_models/",
    "best_model_save_name":"best_cino.pth",
    "class_names":["Politics", "Economy", "Society", "Culture", "World", "IT/Science", "Sport"]
}

After running this program, you could check the log messages and model testing results on the dev set in ${project-dir}/log/cino_ynat.log.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_EN.md

README_EN.md

Usage

Requirements

Steps

Step 1：Model preparation

Step 2：Data Preparation

Step 3：Run command

Files

README_EN.md

Latest commit

History

README_EN.md

File metadata and controls

Usage

Requirements

Steps

Step 1：Model preparation

Step 2：Data Preparation

Step 3：Run command