Codes and dataset for EMNLP2018 paper ‘‘Adaptive Semi-supervised Learning for Cross-domain Sentiment Classification’’. (pdf)
You can download the datasets (small-scale, large-scale, and amazon-benchmark) at [Download]. The zip file should be decompressed and put in the root directory.
Download the pretrained Glove vectors [glove.840B.300d.zip]. Decompress the zip file and put the txt file in the root directory.
You can find arguments and hyper-parameters defined in train_batch.py with default values.
Under code/, use the following command for training any source-target pair from small-scale dataset:
CUDA_VISIBLE_DEVICES="0" python train_batch.py \
--emb ../glove.840B.300d.txt \
--dataset $dataset \
--source $source \
--target $target \
where --emb is the path to the pre-trained word embeddings. $dataset in ['small_1', 'small_2'] denotes the experimental setting 1 and 2 respectively on the small-scale dataset. $source and $target are domains from the small-scale dataset, both in ['book', 'electronics', 'beauty', 'music']. All other hyper-parameters are left as their defaults.
To train on any source-target pair from the large-scale dataset, use:
CUDA_VISIBLE_DEVICES="0" python train_batch.py \
--emb ../glove.840B.300d.txt \
--dataset large \
--source $source \
--target $target \
-b 250 \
--weight-entropy 0.2 \
--weight-discrepancy 500 \
where $source and $target are domains from the large-scale dataset, both in ['imdb', 'yelp2014', 'cell_phone', 'baby']. The batch_size -b is set to 250. The weights of target entropy loss and discrepancy loss are set to 0.2 and 500 respectively. All other hyper-parameters are left as their defaults.
To train on any source-target pair from the amazon benchmark, use:
CUDA_VISIBLE_DEVICES="0" python train_batch.py \
--emb ../glove.840B.300d.txt \
--dataset amazon \
--source $source \
--target $target \
--n-class 2 \
where $source and $target are domains from the amazon benchmark, both in ['book', 'dvd', 'electronics', 'kitchen']. --n-class denoting the number of output classes is set to 2 as we only consider binary classification (positive or negative) on this dataset. All other hyper-parameters are left as their defaults.
During training, the model's performance will be evaluated on development set at the end of each epoch. Accuracy and macro-F1 score on test set are recorded at the epoch where the model achieves the best classification accuracy on development set.
You can find the numerical results in Appendix Table 3 and Table 4. The current version of code is improved in batch sampling for the large-scale dataset. By running this code, an average of 2% macro-F1 improvements can be observed across all source-target pairs on the larget-scale dataset compared to results in Table 4 (c). The results on the small-scale dataset and amazon benchmark are not affected.
The code was only tested under the environment below:
- Python 2.7
- Keras 2.1.2
- tensorflow 1.4.1
- numpy 1.13.3
If you use the code, please cite the following paper:
@InProceedings{he-EtAl:2018,
author = {He, Ruidan and Lee, Wee Sun and Ng, Hwee Tou and Dahlmeier, Daniel},
title = {Adaptive Semi-supervised Learning for Cross-domain Sentiment Classification},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
publisher = {Association for Computational Linguistics}
}