Skip to content

Glossary

G-Huang edited this page Oct 31, 2019 · 1 revision

SMILES: a simple ascii string-based method for representing molecules and reactions (see http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html). Note that a single molecule can be represented by multiple SMILES strings.

SMARTS: a simple ascii string-based method for representing molecular substructures; an extension of SMILES (see http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html)

SMILES file: A text file containing SMILES strings; each SMILES is the first element of each line/row. Traditionally, the second row is the identifier (ID) of the molecule. Additional columns may exist. Columns are separated by space or tab with the former being the standard for LillyMol. Typical file extension: ‘.smi’

SMARTS file: A text file containing SMARTS strings; each SMARTS is the first element of each line/row. Traditionally, the second row is the identifier (ID) of the substructure. Additional columns may exist. Columns are separated by space or tab with the former being the standard for LillyMol. Typical file extension: ‘.smt’

SDF or SD file: A simple, ascii connection table-based method for representing molecules and substructures (see https://en.wikipedia.org/wiki/Chemical_table_file). Typical file extension: ‘.sdf’

RDF or RD file: A simple, ascii connection table-based method for representing chemical reactions (see http://c4.cabrillo.edu/404/ctfile.pdf). Typical file extension: ‘.rdf’

Chemical substructure: A contiguous chemical fragment; may not be a valid molecule.

Substructure search: The process of searching for the presence of a chemical substructure in molecules.

Canonical SMILES: A special, unique SMILES representation for a specific chemical structure.

Structure clean-up: Common chemoinformatics process that ‘cleans’ a structure representation from salts, fragments, etc and checks the structure representation for simple errors e.g. syntax, valence, etc.

GFP (or gfp): Generalized FingerPrint format commonly used by LillyMol tools. A TDT-like format (inspired by TDT - Thor Data Tree format introduced by Daylight Chemical Information Systems; see http://www.daylight.com/meetings/summerschool01/course/basics/tdt.html)

Reaction SMILES: a simple ascii string-based method for representing chemical reactions using SMILES strings.

Reaction SMILES file: A text file containing reaction SMILES strings; each reaction SMILES is the first element of each line/row. Traditionally, the second row is the identifier (ID) of the reaction. Additional columns may exist. Columns are separated by space or tab with the former being the standard for LillyMol. Typical file extension: ‘.rsmi’

Reaction signature: The unique SMILES-like string representing the extended reaction core of a chemical reaction.

References/Resources

https://en.wikipedia.org/wiki/Chemical_file_format

http://c4.cabrillo.edu/404/ctfile.pdf

Clone this wiki locally