FP-BERT is a pre-training based method for the QSAR Problem. We pre-trained a bi-directional encoder representations from transformers (BERT) encoder to obtain the semantic representation of compound fingerprints, called Fingerprints-BERT (FP-BERT), in a self-supervised learning manner. Then, the encoded molecular representation by the FP-BERT is input to the convolutional neural network (CNN) to extract higher-level abstract features, and the predicted properties of the molecule are finally obtained through fully connected layer for distinct classification or regression tasks. In the "pre-trained" folder, the vocabulary file "mol2vec_vocabs.txt" contains 3,352 sub structure identifiers and five special words [PAD], [UNK], [CLS], [SEP], and [MASK]. The "EIECTRA-train.py" file is used for pre-training the BERT model by learning the molecular embedding. The corpus for pre-training is the "e15_smile_train.txt" which is share on the website "https://figshare.com/articles/dataset/Compound_dataset_for_pre-training/19092248". Thus, in the "EIECTRA-train.py" file, the statement of "file_path = mol2vec_corpus_e15_small.txt" can be changed into "file_path = e15_smile_train.txt". The outputted intermediate result "fingerprints_smile_output256.tar.gz" of learned molecular embedding is shared on the website "https://figshare.com/articles/software/fingerprints_smile_output256_tar_gz/19609440". The file of "my_tokenizers2.py" is mainly used to define the tokenizer class. In the folder of "original-dataset", we share the original datasets for downstream tasks of classification and regression. The "fp2bert" folder contains the "preprocessing" folder and the "code" folder. In the "preprocessing" folder, we share the ".ipynb" files for fine-tuning the BERT model the according to specific downstream datasets, and the ".npy" files of the intermediate BERT models for downsteam tasks are share on the website "https://figshare.com/articles/dataset/FP2BERT_embedding/19573084". In the "code" folder, we share the ".ipynb" code files for downstream tasks.
-
Notifications
You must be signed in to change notification settings - Fork 6
fanganpai/fp2bert
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published