cawa

Credit Attribution with Attention

The folder Code contains the python code for Credit Attribution With Attention (CAWA) and other helper scripts for evaluation. The code uses Pytorch framework. The folder Data contains the preprocessed data for the five text datasets: Movies, Ohsumed, TMC2007, Patents and Delicious.

usage: python Code/cawa.py

Following arguments can be used

-d DATAPATH, --datapath DATAPATH: Path to the folder containing data files.

-c CLASSES, --classes CLASSES: Number of classes.

-s SEED, --seed SEED: Seed for random initializations.

-a ALPHA, --alpha ALPHA: Alpha (Default 0.2).

-k KERNEL_SIZE, --kernel_size KERNEL_SIZE: Kernel size for smoothing

-v STANDARD_DEVIATION, --standard_deviation STANDARD_DEVIATION: Standard deviation for the gaussian kernel, negative input means simple averaging

-l LEARNING, --learning LEARNING: Learning rate (Default 0.001).

-y NODES, --nodes NODES: Number of nodes in neural network (Default 256).

-e EPOCH, --epoch EPOCH: Num epochs (Default 100).

-b BATCH, --batch BATCH: Batch size (Default 256).

-p DROPOUT, --dropout DROPOUT: Dropout probability (Default 0.5).

-u UNUSED, --unused UNUSED: Use null class (Default 0).

-m CLIPPING, --clipping CLIPPING: Clipping value (Default 0.25).

-f CHECK, --check CHECK: Check flag (Default 10), write results to the file after every epochs.

-q SCRIPTS, --scripts SCRIPTS: Path to the folder containing python scripts.

-r RESULTS, --results RESULTS: Path to the results output file.

Example usage for the Movies dataset:

python Code/cawa.py --datapath Data/cmumovies --classes 6 --seed 0 --alpha 0.2 --kernel_size 3 --standard_deviation -1 --learning 0.001 --nodes 256 --epoch 100 --batch 256 --dropout 0.5 --unused 0 --clipping 0.25 --check 10 --scripts Code/scripts --results results.txt

The output will be as follows:

After every check_flag=10 epochs, the model will write the evaluation results for the credit attribution as well as multilabel classification for different values of beta to the results file. The evaluation will be performed for both the test and validation datasets.

Each line in the results file will have comma separated 24 fields as follows:

random seed
alpha
kernel_size
kernel_sd
learning_rate
hidden_dim
epoch
batch_size
dropout
use_null
clipping_value
beta
roc
roc_macro
micro_f1
samples_f1
macro_f1
weighted_f1
sov_strict_valid
sov_smooth_valid
accuracy_valid
sov_strict_test
sov_smooth_test
accuracy_test

The fields 13 to 18 correspond to the evaluation on multilabel classification on the test dataset. The fields 19 to 21 correspond to the evaluation of credit attribution on the validation set and the fields 22 to 24 correspond to the evaluation on test set. For multilabel classification, the metrics of interest are the fields 13, 14 and 16. For credit attribution, the metrics of interest are the fields 23 and 24.

The best hyperparameter values for different datasets are:

Dataset α β

Movies 0.2 0.1

Ohsumed 0.1 0.1

TMC2007 0.1 0.3

Patents 0.5 0.3

Delicious 0.1 0.2

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Code		Code
Data		Data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cawa

About

Releases

Packages

Languages

gurdaspuriya/cawa

Folders and files

Latest commit

History

Repository files navigation

cawa

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages