Currently, adversarial training has become a popular and powerful regularization method in the natural language domain. In this paper, we propose Regularized Adversarial Training based on adversarial training and dropout, called R-AT, which forces the output probability distributions of different sub-models generated by dropout to be consistent with each other under the same adversarial sample input. Specifically, we generate adversarial samples by perturbing the word embeddings. For each adversarial sample fed to the model, R-AT minimizes both the adversarial risk and the bidirectional KL-divergence between the adversarial output distributions of two sub-models sampled by dropout. Through extensive experiments on 13 public natural language understanding datasets, we found that R-AT has improvements for many models (e.g., rnn-based, cnn-based, and transformer-based models). For the GLUE benchmark, when R-AT is only applied to the fine-tuning stage, it is able to improve the overall test score of the BERT-base model from 78.3 to 79.6 and the RoBERTa-large model from 88.1 to 88.6. Theoretical analysis reveals that R-AT has potential gradient regularization during the training process. Furthermore, R-AT can reduce the inconsistency between training and testing of models with dropout.
@inproceedings{ni2022R-AT,
title={R-AT: Regularized Adversarial Training for Natural Language Understanding},
author={Ni, Shiwen and Li, Jiawen and Kao, Hung-Yu},
booktitle={Findings of the Association for Computational Linguistics: EMNLP 2022},
year={2022}
}