Skip to content

yourh/BERTDeCS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BERTDeCS

Large-scale semantic indexing of Spanish biomedical literature using contrastive transfer learning

Quick Start

Install the requirements of BERTDeCS:

git clone https://github.com/yourh/BERTDeCS.git
cd BERTDeCS
conda create -n BERTDeCS python=3.12
conda activate BERTDeCS
pip install -r requirements.txt
mkdir models
cd models
wget https://zenodo.org/records/14190447/files/BERTDeCS_A-DeCS_ES.pt
cd ..

Preprocess the citations with journal names, titles and abstracts:

python preprocess.py tokenize \
-j data/test_st1_journal.txt \
-t data/test_st1_title.txt \
-a data/test_st1_abstract.txt \
-o data/test_st1

Predict the DeCS terms by BERTDeCS:

python main.py \
configures/data.yaml \
configures/BERTDeCS-A.yaml \
--valid-name dev_st1 \
--labels decs \
--eval "test_st1" \
-b 25 \
-a

Evaluate the performance of prediction:

python evaluation.py \
-t data/test_st1_decs.txt \
-r results/BERTDeCS_A-DeCS_ES-test_st1.npz \
-n 10

Training

We have trained BERTDeCS on 4×4090 by following steps:

  1. Preprocess the pre-training and training data by
python preprocess.py tokenize \
-j data/{journal} \
-t data/{title} \
-a data/{abstract} \
-o data/{data_name}
  1. Run contrastive learning by
torchrun --nproc-per-node 4 main.py \
configures/data.yaml \
configures/BERTDeCS-CL.yaml \
--train-name train_cl \
--train \
--dist -a
  1. Run pre-training by
torchrun --nproc-per-node 4 main.py \
configures/data.yaml \
configures/BERTDeCS-Af.yaml \
--train-name train_pubmed \
--valid-name dev_st1 \
--labels mesh_decs \
--train \
-p models/BERTDeCS_CL-DeCS_CL.pt \
-b 25 \
--dist -a
  1. Run fine-tuning by
torchrun --nproc-per-node 4 main.py \
configures/data.yaml \
configures/BERTDeCS-A.yaml \
--train-name train_es \
--valid-name dev_st1 \
--labels decs \
--train --eval "dev_st1,test_st1" \
-p models/BERTDeCS_Af-DeCS_PM300W.pt \
-b 25 \
--dist -a

Reference

Declaration

It is free for non-commercial use. For commercial use, please contact Dr. Ronghui You and Prof. Shanfeng Zhu (zhusf@fudan.edu.cn).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published