Note:
- The file
TD_RvNN.vol_5000.txt
in the Dataset is obtained from https://github.com/majingCUHK/Rumor_RvNN.
If you use any source code or dataset included in this repo, please cite this paper:
@inproceedings{song2021adversary,
title={Adversary-Aware Rumor Detection},
author={Song, Yun-Zhu and Chen, Yi-Syuan and Chang, Yi-Ting and Weng, Shao-Yu and Shuai, Hong-Han},
booktitle={ACL-IJCNLP: Findings},
year={2021}
}
Many rumor detection models have been proposed to automatically detect the rumors based on the contents and propagation path. However, most previous works are not aware of malicious attacks, e.g., framing. Therefore, we propose a novel rumor detection framework, Adversary-Aware Rumor Detection, to improve the vulnerability of detection models, including Weighted-Edge Transformer-Graph Network and Position-aware Adversarial Response Generator. To the best of our knowledge, this is the first work that can generate the adversarial response with the consideration of the response position. Even without the adversarial learning process, our detection model (Weighted-Edge Transformer-Graph Network) is also a strong baseline for rumor detection task on Twitter15, Twitter16 and Pheme.
Detailed env is included in requirement.txt
- We collect the user comments following Twitter's policy, and the processed dataset is available here. The dataset should be placed in
./dataset/
- To train the generator, we need the pretrained model, which can be downloaded here). The pretrained generation model should be placed in
./results/pretrain/
The data preprocessing is followed BiGAN. The raw datasets except the comments can be downloaded in raw_pheme provided by Zubiagaet al., 2016 and raw_twitter15_twitter16 provided by Ma et al., 2017.
|_src\
|_run.sh -> script to run the code
|_main.py
|_models\
|_trainer_gen.py -> warpping different experiments
|_trainer.py -> model trainer
|_model.py -> main class for AARD model
|_model_detector.py -> for supporting model.py
|_model_decoder.py -> for supporting model.py
|_predictor.py -> for decoding form generator
|_data\ -> for spliting 5-fold and building datagraph
|_eval\ -> define evaluation metric (Recall, Precision and F-score of each class)
|_others\ -> define loss, logging info
|_dataset\
|_Pheme\
|_Phemetextgraph\ -> can be automatically generated data/getgraph.py
|_twitter15\
|_twitter15textgraph\ -> can be automatically generated data/getgraph.py
|_twitter16\
|_twitter16textgraph\ -> can be automatically generated data/getgraph.py
|_results\
|_pretrain\
|_XSUM_BertExtAbs\
python main.py \
-train_detector \
-train_adv \
-fold '0,1,2,3,4' \
-dataset_dir '../dataset/Pheme' \
-savepath '../results/Pheme' \
-batch_size 48 \
-filter True \
-log_tensorboard \
-warmup_steps 100 \
python main.py \
-train_detector \
-fold '0,1,2,3,4' \
-dataset_dir '../dataset/Pheme' \
-savepath '../results/Pheme' \
-filter True \
-batch_size 48 \
-log_tensorboard \
-warmup_steps 100 \
python main.py \
-test_detector \
-fold '0,1,2,3,4' \
-dataset_dir '../dataset/Pheme' \
-savepath '../results/Pheme' \
-filter True \
-batch_size 48 \
-log_tensorboard \
-warmup_steps 100 \
python main.py \
-test_detector \
-test_gen \
-fold '0,1,2,3,4' \
-dataset_dir '../dataset/Pheme' \
-savepath '../results/Pheme' \
-filter True \
-batch_size 48 \
-log_tensorboard \
-warmup_steps 100 \
Run the model testing under different data time stamp.
python main.py \
-early '0,6,12,18,24,30,36,42,48,54,60,120' \
-fold '0,1,2,3,4' \
-dataset_dir '../dataset/twitter15' \
-savepath '../results/twitter15/early_detection' \
-filter True \
-batch_size 48 \
python main.py \
-early '0,6,12,18,24,30,36,42,48,54,60,120' \
-fold '0,1,2,3,4' \
-dataset_dir '../dataset/twitter16' \
-savepath '../results/twitter16/early_detection' \
-filter True \
-batch_size 48 \
python main.py \
-early '0,60,120,240,480,720,1440,2880' \
-fold '0,1,2,3,4' \
-dataset_dir '../dataset/Pheme' \
-savepath '../results/Pheme/early_detection' \
-filter True \
-batch_size 48 \
Train the models under different quantities of data, ranging from 5% to100%, and evaluate them on the same testing set.
python main.py \
-train_detector \
-quat '5,10,25,50,75,100' \
-fold '0,1,2,3,4' \
-dataset_dir '../dataset/Pheme' \
-savepath '../results/pheme/data_scarcity' \
-filter True \
-batch_size 48 \
-log_tensorboard \