Bilingual Rhetorical Structure Parsing

This repository contains the official code and data for the ACL 2024 Findings paper Bilingual Rhetorical Structure Parsing with Large Parallel Annotations.

Trained Models

This repository focuses on data and experiments. For applying the trained parsers, visit the IsaNLP RST repository for models and usage instructions.

Data

The data directory structure should be as follows:

data/
├── gum_rs3/
│   ├── en/
│   │   └── *.rs3
│   └── ru/
│       └── *_RU.rs3
├── rstdt_rs3/
│   ├── TEST/
│   │   └── wsj_*.rs3
│   └── TRAINING/
│       └── wsj_*.rs3
└── rurstb_rs3/
    ├── train.*_part_*.rs3
    ├── dev.*_part_*.rs3
    └── test.*_part_*.rs3

gum_rs3/ru/ Contains the RRG corpus in Russian. data/RRG.zip
gum_rs3/en/ Place the GUM RST *.rs3 files here. GUM dataset link.
rstdt_rs3/ Place the RST-DT *.rs3 files here. RST-DT dataset link.
rurstb_rs3/ Contains the RRT corpus; one document = one tree. data/rurstb_rs3.zip

The train/dev/test splits for GUM/RRG are listed under data/gum_file_lists for GUM v9.1. If you are using a later extended version, you should update these file lists accordingly.

Experiments

Set WANDB_KEY in dmrst_parser/keys.py for online wandb support.

Monolingual Experiments

Train:

python dmrst_parser/multiple_runs.py --corpus "$CORPUS" --lang "$LANG" --model_type "$TYPE" --cuda_device 0 train

Evaluate:

python dmrst_parser/multiple_runs.py --corpus "$CORPUS" --lang "$LANG" --model_type "$TYPE" --cuda_device 0 evaluate

Bilingual Experiments

Train:

python dmrst_parser/multiple_runs.py --corpus 'GUM' --lang "$LANG" --model_type "$TYPE" train_mixed --mixed 100

Evaluate:

python utils/eval_dmrst_transfer.py --models_dir saves/path-with-models \
                                    --corpus 'GUM' --lang "$LANG2" --nfolds 5 evaluate

Parameters

LANG: en, ru
CORPUS: RST-DT, GUM (RRG with lang=ru), RuRSTB (RRT)
TYPE: default, +tony, +tony+bilstm_edus

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
configs		configs
data		data
dmrst_parser		dmrst_parser
relation_classifier		relation_classifier
utils		utils
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bilingual Rhetorical Structure Parsing

Trained Models

Data

Experiments

Monolingual Experiments

Bilingual Experiments

Parameters

About

Languages

tchewik/bilingualrsp

Folders and files

Latest commit

History

Repository files navigation

Bilingual Rhetorical Structure Parsing

Trained Models

Data

Experiments

Monolingual Experiments

Bilingual Experiments

Parameters

About

Topics

Resources

Stars

Watchers

Forks

Languages