This repository contains various datasets used in CTUAvastLab. Currently it contains only mutagenesis dataset.
The dataset comprises of 230 molecules trialed for mutagenicity on Salmonella typhimurium. A subset of 188 molecules is learnable using linear regression. This subset was later termed the ”regression friendly” dataset. The remaining subset of 42 molecules is named the ”regression unfriendly” dataset. (taken from relational.fit.cvut.cz/).
Currently, this repository contains only Mutagenesis_188
.
relational.fit.cvut.cz/ where the original data is hosted as SQL database. Original source
see separate file.
mutagenesis/data.json contains data from dataset Mutagenesis_188, as list of 188 strucures, each representing one molecule, as a json.
mutagenesis/meta.json contains metadata about the dataset, as a json.