This repository contains code for our NAACL2022 paper:
Generative Cross-Domain Data Augmentation for Aspect and Opinion Co-Extraction
The training data comes from three domains: Restaurant(R) 、 Laptop(L) 、 Device(D).
We follow the previous work and remove the sentences that have no aspects and opinions when device is the source domain.
The in-domain corpus(used for training BERT-E) come from yelp and amazon reviews.
Click here to get BERT-E (BERT-Extented) , and the extraction code is by0i. (Please specify the directory where BERT is stored in modelconfig.py.)
transformers==4.2.2
pytorch==1.10.0
- Firstly, we run the following code to achieve the target pseudo labeled data:
cd aeoe
cd ae_oe_bert_crf
bash ./run_bert_e_sdl.sh
- Then, we run the following code to achieve masked data:
cd ..
bash ./process_data.sh
- After that, we train the bart for data generation:
cd ..
cd da
bash ./test.sh
bash ./post_process.sh
- finally, we filter the generated data and train it for downstreamtask:
cd ..
cd aeoe
cd ae_oe_bert_crf
bash ./run_bert_e_da_filter.sh
bash ./run_co_guess.sh
bash ./run_bert_e_da_train.sh