Skip to content

microsoft/BioGPT

Repository files navigation

BioGPT

This repository contains the implementation of BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining, by Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon and Tie-Yan Liu.

Requirements and Installation

  • PyTorch version == 1.12.0
  • Python version == 3.10
  • fairseq version == 0.12.0:
git clone https://github.com/pytorch/fairseq
cd fairseq
git checkout v0.12.0
pip install .
python setup.py build_ext --inplace
cd ..
  • Moses
git clone https://github.com/moses-smt/mosesdecoder.git
export MOSES=${PWD}/mosesdecoder
  • fastBPE
git clone https://github.com/glample/fastBPE.git
export FASTBPE=${PWD}/fastBPE
cd fastBPE
g++ -std=c++11 -pthread -O3 fastBPE/main.cc -IfastBPE -o fast
  • sacremoses
pip install sacremoses
  • sklearn
pip install scikit-learn

Remember to set the environment variables MOSES and FASTBPE to the path of Moses and fastBPE respetively, as they will be required later.

Getting Started

Pre-trained models

We provide our pre-trained BioGPT model checkpoints along with fine-tuned checkpoints for downstream tasks, available both through URL download as well as through the Hugging Face 🤗 Hub.

Model Description URL 🤗 Hub
BioGPT Pre-trained BioGPT model checkpoint link link
BioGPT-Large Pre-trained BioGPT-Large model checkpoint link link
BioGPT-QA-PubMedQA-BioGPT Fine-tuned BioGPT for question answering task on PubMedQA link
BioGPT-QA-PubMedQA-BioGPT-Large Fine-tuned BioGPT-Large for question answering task on PubMedQA