qsarify is a library of tools for the analysis of QSAR/QSPR datasets and models. This library is intended to be used to produce models which relate a set of calculated chemical descriptors to a given numeric endpoint. Many great tools will take the geometry or string data of a given chemical and compute descriptors, which are numeric measures of the properties of these, but you can generate some of these with another one of my scripts, Free Descriptors.
- Python 3
- numpy
- pandas
- scikit-learn
- matplotlib
pip install qsarify
- Data preprocessing tools:
data_tools
- Dimensionality reduction via clustering:
clustering
- Feature selection:
- Single threaded:
feature_selection_single
- Multi-threaded:
feature_selection_multi
- Single threaded:
- Model Export and Visualization:
model_export
- Cross Valiidation:
cross_validation
The best way to learn how to use this library is to look at the example notebook in the examples
folder. This notebook will walk you through the workflow of using this library to build a QSAR model.
- Massively parallel feature selection methods:
- CUDA acceleration
- MPI acceleration
- Include Shannon Entropy as a dimensionality reduction metric in clustering
- Embedded kernel methods
- More visualization tools
- More cross validation tools
- Feature selection tools for categorical data
If you would like to contribute to this project, please feel free to fork this repository and submit a pull request. Otherwise, you may also submit an issue. I will try to respond to issues as quickly as possible.
This project is licensed under the GNU GPLv3 license. See the LICENSE file for more details.
- Authors acknowledge the support from the National Science Foundation for the grant award NSF/IUCRC/CB2-2113804 "Center for Bioplastic and Biocomposites (CB2)", and from the United States Navy’s Office of Naval Research under grant number N00014-22-1-2129.
- This work was also supported in part by the National Science Foundation NSF-MRI award OAC-2019077.
- The authors are grateful for financial and administrative support provided by the Department of Coatings and Polymer Materials (NDSU).
- Finally, This work used resources of the Center for Computationally Assisted Science and Technology (CCAST) at North Dakota State University, which were made possible in part by NSF MRI Award No. 2019077.
If you use this library in your work, please cite it as follows:
Szwiec, Stephen. (2023). qsarify: A high performance library for QSAR model development.
BibTex:
@misc{szwiec2023qsarify,
authors = {Szwiec, Stephen and Rasulev, Bakhtiyor},
title = {qsarify: A high performance library for QSAR model development},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/stephenszwiec/qsarify}},
}