This repository contains the implementation of BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining, by Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon and Tie-Yan Liu.
- PyTorch version == 1.12.0
- Python version == 3.10
- fairseq version == 0.12.0:
git clone https://github.com/pytorch/fairseq
cd fairseq
git checkout v0.12.0
pip install .
python setup.py build_ext --inplace
cd ..
- Moses
git clone https://github.com/moses-smt/mosesdecoder.git
export MOSES=${PWD}/mosesdecoder
- fastBPE
git clone https://github.com/glample/fastBPE.git
export FASTBPE=${PWD}/fastBPE
cd fastBPE
g++ -std=c++11 -pthread -O3 fastBPE/main.cc -IfastBPE -o fast
- sacremoses
pip install sacremoses
- sklearn
pip install scikit-learn
Remember to set the environment variables MOSES
and FASTBPE
to the path of Moses and fastBPE respetively, as they will be required later.
We provide our pre-trained BioGPT model checkpoints along with fine-tuned checkpoints for downstream tasks, available both through URL download as well as through the Hugging Face 🤗 Hub.
Model | Description | URL | 🤗 Hub |
---|---|---|---|
BioGPT | Pre-trained BioGPT model checkpoint | link | link |
BioGPT-Large | Pre-trained BioGPT-Large model checkpoint | link | link |
BioGPT-QA-PubMedQA-BioGPT | Fine-tuned BioGPT for question answering task on PubMedQA | link | |
BioGPT-QA-PubMedQA-BioGPT-Large | Fine-tuned BioGPT-Large for question answering task on PubMedQA |