Correlation Networks for Extreme Multi-label Text Classification
- python==3.6.3
- pytorch==1.2.0
- torchgpipe==0.0.5
- click==7.0
- ruamel.yaml==0.16.5
- numpy==1.16.2
- scipy==1.2.1
- scikit-learn==0.20.3
- gensim==3.7.2
- nltk==3.2.4
- tqdm==4.31.1
- joblib==0.13.2
- logzero==1.5.0
Pretrained Word Embeddings in gensim format
Preprocess (the EUR-Lex dataset is already tokenized in advance)
./scripts/preprocess_eurlex.sh
or (the other datasets need to be tokenized using NLTK)
./scripts/preprocess_others.sh
Train and evaluate
./scripts/run_models.sh
The codes for the baseline models are adapted from the following repositories: XML-CNN, BERT, MeSHProbeNet, and AttentionXML.