Fine-Tuning T5-Style Language Models for Organic Reaction Prediction

This codebase covers functionality related to single-task fine-tuning and inference of T5-style models: this includes the original T5, FlanT5, as well as ByT5, and other T5-based models such as molT5 or nach0.

Installation

The requirement.txt file list all depending Python libraries and is provided to create a conda environment.

Usage

Training

The running bash script is run_t5_single-task_train.sh which calls the actual Python code/script t5_single-task_train.py. You can run the script by simply executing bash run_t5_single-task_train.sh.

Some hyperparameters are hard-coded in the python file under TrainingArguments. Most of these parameters are directly associated with the standard TrainingArguments from HuggingFace, see the following links for further guidance: https://huggingface.co/docs/transformers/en/main_classes/trainer https://huggingface.co/docs/transformers/v4.41.1/en/main_classes/trainer#transformers.Seq2SeqTrainingArguments

Inference

The running bash script is run_t5_infer.sh which calls the actual Python code/script t5_infer.py. You can run the script by simply executing bash run_t5_single-task_train.sh.

Data Format

An example dataset is provided under data/. The format is quite simple:

• It is a csv file with two columns, where “,” is the delimiter

• First column is the “Input” column (the original sequence before any preprocessing)

• Second column is the “Output” column (also the original gold sequence).

Reference

Jiayun Pang and Ivan Vulić. Specialising and Analysing Instruction-Tuned and Byte-Level Language Models for Organic Reaction Prediction. (2024) arXiv preprint arXiv:2405.10625 https://arxiv.org/abs/2405.10625

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
__pycache__		__pycache__
data/uspto_mit/mit_separated		data/uspto_mit/mit_separated
evaluation		evaluation
generation		generation
tokenization		tokenization
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_t5_infer.sh		run_t5_infer.sh
run_t5_single-task_train.sh		run_t5_single-task_train.sh
t5_infer.py		t5_infer.py
t5_single-task_train.py		t5_single-task_train.py
utils_other.py		utils_other.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fine-Tuning T5-Style Language Models for Organic Reaction Prediction

Installation

Usage

Training

Inference

Data Format

Reference

About

Releases

Packages

Contributors 2

Languages

License

cambridgeltl/chem-encdec

Folders and files

Latest commit

History

Repository files navigation

Fine-Tuning T5-Style Language Models for Organic Reaction Prediction

Installation

Usage

Training

Inference

Data Format

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages