Polymer Information Extraction

IMPORTANT NOTE: The code and data shared here is available for academic non-commercial use only

This repo contains code for the paper 'A general purpose material property extraction pipeline from large polymer corpora using natural language processing'[1].

Requirements and Setup

Python 3.7
Pytorch (version 1.10.0)
Transformers (version 4.17.0)

You can install all required Python packages using the provided environment.yml file using conda env create -f environment.yml

Running the code

Example scripts and parameters for running training of the NER model is provided in the file run_ner.sh.

The script for fine-tuning of the masked language model can be run by using the following command:

python run_mlm.py \
    --model_name_or_path bert-base \
    --train_file /path/to/train/file \
    --do_train \
    --do_eval \
    --output_dir /output

Use python data_extraction.py to combine NER predictions using heuristic rules.

The NER model used for sequence labeling can be found here

The MaterialsBERT language model that is used as the encoder for the above NER model can be found here

Please cite our paper if you use the code or data in this repo

@article{materialsbert,
  title={A general-purpose material property data extraction pipeline from large polymer corpora using natural language processing},
  author={Shetty, Pranav and Rajan, Arunkumar Chitteth and Kuenneth, Chris and Gupta, Sonakshi and Panchumarti, Lakshmi Prerana and Holm, Lauren and Zhang, Chao and Ramprasad, Rampi},
  journal={npj Computational Materials},
  volume={9},
  number={1},
  pages={52},
  year={2023},
  publisher={Nature Publishing Group UK London}
}

References

[1] Shetty, P., Rajan, A., Kuenneth, C., Gupta, S., Panchumarti, L., Holm, L., Zhang, C. & Ramprasad, R. A general-purpose material property data extraction pipeline from large polymer corpora using natural language processing. npj Computational Materials 9, 52 (2023)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data/PolymerAbstracts		data/PolymerAbstracts
src		src
LICENSE		LICENSE
README.MD		README.MD
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Polymer Information Extraction

IMPORTANT NOTE: The code and data shared here is available for academic non-commercial use only

Requirements and Setup

Running the code

References

About

Releases

Packages

Contributors 3

Languages

License

Ramprasad-Group/polymer_information_extraction

Folders and files

Latest commit

History

Repository files navigation

Polymer Information Extraction

IMPORTANT NOTE: The code and data shared here is available for academic non-commercial use only

Requirements and Setup

Running the code

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages