BERTDeCS

Large-scale semantic indexing of Spanish biomedical literature using contrastive transfer learning

Quick Start

Install the requirements of BERTDeCS:

git clone https://github.com/yourh/BERTDeCS.git
cd BERTDeCS
conda create -n BERTDeCS python=3.12
conda activate BERTDeCS
pip install -r requirements.txt
mkdir models
cd models
wget https://zenodo.org/records/14190447/files/BERTDeCS_A-DeCS_ES.pt
cd ..

Preprocess the citations with journal names, titles and abstracts:

python preprocess.py tokenize \
-j data/test_st1_journal.txt \
-t data/test_st1_title.txt \
-a data/test_st1_abstract.txt \
-o data/test_st1

Predict the DeCS terms by BERTDeCS:

python main.py \
configures/data.yaml \
configures/BERTDeCS-A.yaml \
--valid-name dev_st1 \
--labels decs \
--eval "test_st1" \
-b 25 \
-a

Evaluate the performance of prediction:

python evaluation.py \
-t data/test_st1_decs.txt \
-r results/BERTDeCS_A-DeCS_ES-test_st1.npz \
-n 10

Training

We have trained BERTDeCS on 4×4090 by following steps:

Preprocess the pre-training and training data by

python preprocess.py tokenize \
-j data/{journal} \
-t data/{title} \
-a data/{abstract} \
-o data/{data_name}

Run contrastive learning by

torchrun --nproc-per-node 4 main.py \
configures/data.yaml \
configures/BERTDeCS-CL.yaml \
--train-name train_cl \
--train \
--dist -a

Run pre-training by

torchrun --nproc-per-node 4 main.py \
configures/data.yaml \
configures/BERTDeCS-Af.yaml \
--train-name train_pubmed \
--valid-name dev_st1 \
--labels mesh_decs \
--train \
-p models/BERTDeCS_CL-DeCS_CL.pt \
-b 25 \
--dist -a

Run fine-tuning by

torchrun --nproc-per-node 4 main.py \
configures/data.yaml \
configures/BERTDeCS-A.yaml \
--train-name train_es \
--valid-name dev_st1 \
--labels decs \
--train --eval "dev_st1,test_st1" \
-p models/BERTDeCS_Af-DeCS_PM300W.pt \
-b 25 \
--dist -a

Reference

Declaration

It is free for non-commercial use. For commercial use, please contact Dr. Ronghui You and Prof. Shanfeng Zhu (zhusf@fudan.edu.cn).

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
bertdecs		bertdecs
configures		configures
data		data
.gitignore		.gitignore
README.md		README.md
evaluation.py		evaluation.py
main.py		main.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BERTDeCS

Quick Start

Training

Reference

Declaration

About

Releases

Packages

Languages

yourh/BERTDeCS

Folders and files

Latest commit

History

Repository files navigation

BERTDeCS

Quick Start

Training

Reference

Declaration

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages