Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adicionar informações sobre smiles das moleculas #33

Closed
lipelopesoliveira opened this issue Mar 2, 2023 · 9 comments · Fixed by #30
Closed

Adicionar informações sobre smiles das moleculas #33

lipelopesoliveira opened this issue Mar 2, 2023 · 9 comments · Fixed by #30
Assignees
Labels
📶 enhancement New feature or request

Comments

@lipelopesoliveira
Copy link
Owner

lipelopesoliveira commented Mar 2, 2023

Motivação

Atualmente os blocos de construção e grupos químicos são armazenados na no formato xyz. É possível também adicionar as informações no formato smiles também:

Para o benzeno tripodal:
Organic core: C1=CC=CC=C1
Building Block: [Q]C1=C([Q])C([R1])=C([Q])C([Q])=C1[R1]

O que deve ser feito?

Alterar a forma como as informações são salvas internamente para um formato json que contenha as posições atômicas mas também outras informações sobre os blocos de construção. Considerar utilizar um formato Chemical JSON.

Os códigos para ler o BuildingBlock e criar o Retículo devem utilizar esses arquivos, além de herdar a informação do código SMILES.

Etapas para a realização desta tarefa:

O que será obtido ao final dessa tarefa?

Possui prazo limite de conclusão?

Mais alguma outra coisa?

@lipelopesoliveira lipelopesoliveira added the 📶 enhancement New feature or request label Mar 2, 2023
@lipelopesoliveira lipelopesoliveira self-assigned this Mar 2, 2023
@lipelopesoliveira
Copy link
Owner Author

lipelopesoliveira commented Mar 3, 2023

O código SMILES para cada bloco de construção deverá ser salvo no formato Chemaxon Extended SMILES (XSMILES, rdkit). Esse formato permite que as imagens dos blocos de construção sejam geradas com os labels Q, Rx e X:

output

Esse formato o bloco de construção acima possui o código:

(*)C1=C(*)C(*)=C(*)C(*)=C1(*) |$Q;;;R1;;R2;;Q;;R1;;R2$|

Será necessário desenvolver uma forma de automatizar o processo de geração desses códigos, uma vez que fazer isso manualmente será extremamente trabalhoso.

@lipelopesoliveira
Copy link
Owner Author

lipelopesoliveira commented Mar 3, 2023

Tendo o código smiles da molécula com os átomos especiais, o código abaixo gera o xsmiles com os labels:

smiles_string = '[Q]C1=C([Q])C([R1])=C([Q])C([Q])=C1[R1]'

def smiles_to_xsmiles(smiles_string:str) -> str:
    '''
    Converts a SMILES string to an extended SMILES string with labels

    Parameters
    ----------
    smiles_string : str
        SMILES string to be converted
    
    Returns
    -------
    xsmiles : str
        Extended SMILES string with labels
    '''
    SPECIAL_ATOMS = ['Q', 'R', 'X']
    REGULAR_ATOMS = ['C', 'N', 'H', 'O']

    xsmiles = ''

    labels = []
    
    for i, letter in enumerate(smiles_string):

        if letter in SPECIAL_ATOMS:
            xsmiles += '*'
            labels += [letter]
        
        elif letter.isnumeric():
            if smiles_string[i-1] == 'R':
                labels[-1] = labels[-1] + letter
            else:
                xsmiles += letter
        
        elif letter in REGULAR_ATOMS:
            xsmiles += letter
            labels += ['']
        
        else:
            xsmiles += letter

    return xsmiles + ' |$' + ';'.join(labels) + '$|'

> smiles_to_xsmiles(smiles_string)
'[*]C1=C([*])C([*])=C([*])C([*])=C1[*] |$Q;;;Q;;R1;;Q;;Q;;R1$|'

@lipelopesoliveira
Copy link
Owner Author

from rdkit import Chem
from rdkit.Chem import Draw

SMILES_LIST = ['[*]C1=C([*])C([*])=C([*])C([*])=C1[*] |$Q;;;R1;;R2;;Q;;R1;;R2$|', 
               '[*]C1=C([*])C([*])=C(C2=C([*])C([*])=C([*])C([*])=C2[*])C([*])=C1[*] |$Q;;;R1;;R4;;;;R4;;R1;;Q;;R2;;R3;;R3;;R2$|', 
               'c1c(O)cccn1', 
               'c1c(F)c(C)ccn1', 
               'c1cc(Cl)c(F)cn1']

NAMES_LIST = ['Benzene', "1-1'-biphenyl", 'c1c(O)cccn1', 'c1c(F)c(C)ccn1', 'c1cc(Cl)c(F)cn1']

mols = [Chem.MolFromSmiles(smi) for smi in SMILES_LIST]

Draw.MolsToGridImage(mols, 
                     molsPerRow=3, 
                     legends=NAMES_LIST, 
                     subImgSize=(300,200),
                     useSVG=True)

Esse código gera uma imagem com todas as moléculas.

@lipelopesoliveira
Copy link
Owner Author

lipelopesoliveira commented Mar 3, 2023

smiles_C2 = [{'name': 'benzene',
  'smiles': '[Q]C1=C([R2])C([R1])=C([Q])C([R2])=C1[R1]',
  'code': 'BENZ',
  'xsmiles': '[*]C1=C([*])C([*])=C([*])C([*])=C1[*]',
  'xsmiles_label': '|$Q;;;R2;;R1;;Q;;R2;;R1$|'},
 {'name': 'naphthalene',
  'smiles': '[Q]C1=C([R3])C([R2])=C2C(C([R2])=C([R3])C([Q])=C2[R1])=C1[R1]',
  'code': 'NAPT',
  'xsmiles': '[*]C1=C([*])C([*])=C2C(C([*])=C([*])C([*])=C2[*])=C1[*]',
  'xsmiles_label': '|$Q;;;R3;;R2;;;;R2;;R3;;Q;;R1;;R1$|'},
 {'name': "1,1'-biphenyl",
  'smiles': '[Q]C1=C([R1])C([R3])=C(C2=C([R4])C([R2])=C([Q])C([R1])=C2[R3])C([R4])=C1[R2]',
  'code': 'BPNY',
  'xsmiles': '[*]C1=C([*])C([*])=C(C2=C([*])C([*])=C([*])C([*])=C2[*])C([*])=C1[*]',
  'xsmiles_label': '|$Q;;;R1;;R3;;;;R4;;R2;;Q;;R1;;R3;;R4;;R2$|'},
 {'name': 'anthracene',
  'smiles': '[R1]C1=C2C(C([R2])=C([R4])C([Q])=C2[R3])=C([R1])C3=C([R3])C([Q])=C([R4])C([R2])=C31',
  'code': 'ANTR',
  'xsmiles': '[*]C1=C2C(C([*])=C([*])C([*])=C2[*])=C([*])C3=C([*])C([*])=C([*])C([*])=C31',
  'xsmiles_label': '|$R1;;;;;R2;;R4;;Q;;R3;;R1;;;R3;;Q;;R4;;R2;$|'},
 {'name': '1,7-dihydro-s-indacene',
  'smiles': '[R1]C1=C2C(C([R2])=C([Q])C2[R3])=C([R1])C3=C1C([R2])C([Q])=C3[R3]',
  'code': 'DHSI',
  'xsmiles': '[*]C1=C2C(C([*])=C([*])C2[*])=C([*])C3=C1C([*])C([*])=C3[*]',
  'xsmiles_label': '|$R1;;;;;R2;;Q;;R3;;R1;;;;R2;;Q;;R3$|'},
 {'name': 'thieno[3,2-b]thiophene',
  'smiles': '[Q]C1=C([R])C2=C(S1)C([R])=C([Q])S2',
  'code': 'TTPH',
  'xsmiles': '[*]C1=C([*])C2=C(S1)C([*])=C([*])S2',
  'xsmiles_label': '|$Q;;;R;;;;;R;;Q;$|'},
 {'name': "3,3'-bipyridine",
  'smiles': '[Q]C1=C([R1])C([R3])=C(C2=C([R2])N=C([Q])C([R1])=C2[R3])C([R2])=N1',
  'code': '3BPD',
  'xsmiles': '[*]C1=C([*])C([*])=C(C2=C([*])N=C([*])C([*])=C2[*])C([*])=N1',
  'xsmiles_label': '|$Q;;;R1;;R3;;;;R2;;;Q;;R1;;R3;;R2;$|'},
 {'name': "2,2'-bithiophene",
  'smiles': '[Q]C1=C([R2])C([R1])=C(C2=C([R1])C([R2])=C([Q])S2)S1',
  'code': 'BTPH',
  'xsmiles': '[*]C1=C([*])C([*])=C(C2=C([*])C([*])=C([*])S2)S1',
  'xsmiles_label': '|$Q;;;R2;;R1;;;;R1;;R2;;Q;;$|'},
 {'name': "1,1':4',1''-terphenyl",
  'smiles': '[Q]C(C([R5])=C1[R6])=C([R4])C([R3])=C1C(C([R1])=C2[R2])=C([R2])C([R1])=C2C3=C([R3])C([R4])=C([Q])C([R5])=C3[R6]',
  'code': 'TPNY',
  'xsmiles': '[*]C(C([*])=C1[*])=C([*])C([*])=C1C(C([*])=C2[*])=C([*])C([*])=C2C3=C([*])C([*])=C([*])C([*])=C3[*]',
  'xsmiles_label': '|$Q;;;R5;;R6;;R4;;R3;;;;R1;;R2;;R2;;R1;;;;R3;;R4;;Q;;R5;;R6$|'},
 {'name': '1,2-diphenylethyne',
  'smiles': '[Q]C(C([R4])=C1[R1])=C([R3])C([R2])=C1C#CC2=C([R1])C([R4])=C([Q])C([R3])=C2[R2]',
  'code': 'DPEY',
  'xsmiles': '[*]C(C([*])=C1[*])=C([*])C([*])=C1C#CC2=C([*])C([*])=C([*])C([*])=C2[*]',
  'xsmiles_label': '|$Q;;;R4;;R1;;R3;;R2;;;;;;R1;;R4;;Q;;R3;;R2$|'},
 {'name': "2,2'-bipyridine",
  'smiles': '[Q]C1=C([R1])C([R3])=C(C2=NC([R2])=C([Q])C([R1])=C2[R3])N=C1[R2]',
  'code': '2BPD',
  'xsmiles': '[*]C1=C([*])C([*])=C(C2=NC([*])=C([*])C([*])=C2[*])N=C1[*]',
  'xsmiles_label': '|$Q;;;R1;;R3;;;;;R2;;Q;;R1;;R3;;;R2$|'},
 {'name': 'pyrene',
  'smiles': '[Q]C1=C([R3])C2=C(C(C([R2])=C3[R1])=C1[R4])C(C3=C([R3])C([Q])=C4[R4])=C4C([R2])=C2[R1]',
  'code': 'PYRN',
  'xsmiles': '[*]C1=C([*])C2=C(C(C([*])=C3[*])=C1[*])C(C3=C([*])C([*])=C4[*])=C4C([*])=C2[*]',
  'xsmiles_label': '|$Q;;;R3;;;;;R2;;R1;;R4;;;;R3;;Q;;R4;;;R2;;R1$|'},
 {'name': 'pyrene-1,3,6,8(2H,7H)-tetraone',
  'smiles': 'O=C([Q]C1=O)C2=C([R1])C([R2])=C3C4=C2C1=C([R2])C([R1])=C4C([Q]C3=O)=O',
  'code': 'PYTO',
  'xsmiles': 'O=C([*]C1=O)C2=C([*])C([*])=C3C4=C2C1=C([*])C([*])=C4C([*]C3=O)=O',
  'xsmiles_label': '|$;;Q;;;;;R1;;R2;;;;;;R2;;R1;;;Q;;;$|'},
 {'name': '1,4-bis(phenylethynyl)benzene',
  'smiles': '[Q]C(C([R4])=C1[R1])=C([R3])C([R2])=C1C#CC(C([R6])=C2[R5])=C([R5])C([R6])=C2C#CC3=C([R1])C([R4])=C([Q])C([R3])=C3[R2]',
  'code': 'BPYB',
  'xsmiles': '[*]C(C([*])=C1[*])=C([*])C([*])=C1C#CC(C([*])=C2[*])=C([*])C([*])=C2C#CC3=C([*])C([*])=C([*])C([*])=C3[*]',
  'xsmiles_label': '|$Q;;;R4;;R1;;R3;;R2;;;;;;R6;;R5;;R5;;R6;;;;;;R1;;R4;;Q;;R3;;R2$|'},
 {'name': '(E)-1,2-diphenylethene',
  'smiles': '[Q]C(C([R4])=C1[R2])=C([R1])C([R3])=C1/C=C/C2=C([R2])C([R4])=C([Q])C([R1])=C2[R3]',
  'code': 'DPEL',
  'xsmiles': '[*]C(C([*])=C1[*])=C([*])C([*])=C1/C=C/C2=C([*])C([*])=C([*])C([*])=C2[*]',
  'xsmiles_label': '|$Q;;;R4;;R2;;R1;;R3;;;;;;R2;;R4;;Q;;R1;;R3$|'},
 {'name': '(E)-1,2-diphenyldiazene',
  'smiles': '[Q]C(C([R4])=C1[R2])=C([R1])C([R3])=C1/N=N/C2=C([R2])C([R4])=C([Q])C([R1])=C2[R3]',
  'code': 'DPDA',
  'xsmiles': '[*]C(C([*])=C1[*])=C([*])C([*])=C1/N=N/C2=C([*])C([*])=C([*])C([*])=C2[*]',
  'xsmiles_label': '|$Q;;;R4;;R2;;R1;;R3;;;;;;R2;;R4;;Q;;R1;;R3$|'},
 {'name': "benzo[1,2-b:4,5-b']dithiophene",
  'smiles': '[Q]C(S1)=C([R2])C2=C1C([R1])=C(C([R2])=C([Q])S3)C3=C2[R1]',
  'code': 'BDTP',
  'xsmiles': '[*]C(S1)=C([*])C2=C1C([*])=C(C([*])=C([*])S3)C3=C2[*]',
  'xsmiles_label': '|$Q;;;;R2;;;;R1;;;R2;;Q;;;;R1$|'},
 {'name': "benzo[1,2-d:4,5-d']bis(thiazole)",
  'smiles': '[Q]C(S1)=NC2=C1C([R1])=C(N=C([Q])S3)C3=C2[R1]',
  'code': 'BBTZ',
  'xsmiles': '[*]C(S1)=NC2=C1C([*])=C(N=C([*])S3)C3=C2[*]',
  'xsmiles_label': '|$Q;;;;;;;R1;;;;Q;;;;R1$|'},
 {'name': '1,4-diphenylbuta-1,3-diyne',
  'smiles': '[Q]C(C([R4])=C1[R1])=C([R2])C([R3])=C1C#CC#CC2=C([R1])C([R3])=C([Q])C([R2])=C2[R3]',
  'code': 'DPBY',
  'xsmiles': '[*]C(C([*])=C1[*])=C([*])C([*])=C1C#CC#CC2=C([*])C([*])=C([*])C([*])=C2[*]',
  'xsmiles_label': '|$Q;;;R4;;R1;;R2;;R3;;;;;;;;R1;;R3;;Q;;R2;;R3$|'},
 {'name': 's-indacene-1,3,5,7(2H,6H)-tetraone',
  'smiles': 'O=C([Q]C1=O)C2=C1C([R])=C(C([Q]C3=O)=O)C3=C2[R]',
  'code': 'INTO',
  'xsmiles': 'O=C([*]C1=O)C2=C1C([*])=C(C([*]C3=O)=O)C3=C2[*]',
  'xsmiles_label': '|$;;Q;;;;;;R;;;Q;;;;;;R$|'},
 {'name': "benzo[1,2-b:4,5-b']difuran",
  'smiles': '[Q]C(O1)=C([R2])C2=C1C([R1])=C(C([R2])=C([Q])O3)C3=C2[R1]',
  'code': 'BDFN',
  'xsmiles': '[*]C(O1)=C([*])C2=C1C([*])=C(C([*])=C([*])O3)C3=C2[*]',
  'xsmiles_label': '|$Q;;;;R2;;;;R1;;;R2;;Q;;;;R1$|'},
 {'name': '1,5-dihydropyrrolo[2,3-f]indole',
  'smiles': '[Q]C(N1[H])=C([R2])C2=C1C([R1])=C(C([R2])=C([Q])N3[H])C3=C2[R1]',
  'code': 'DHPI',
  'xsmiles': '[*]C(N1[H])=C([*])C2=C1C([*])=C(C([*])=C([*])N3[H])C3=C2[*]',
  'xsmiles_label': '|$Q;;;;;R2;;;;R1;;;R2;;Q;;;;;R1$|'},
 {'name': '1,7-dihydro-s-indacene',
  'smiles': '[R1]C1=C2C(C([R2])=C([Q])C2[R3])=C([R1])C3=C1C([R2])C([Q])=C3[R3]',
  'code': 'DHSI',
  'xsmiles': '[*]C1=C2C(C([*])=C([*])C2[*])=C([*])C3=C1C([*])C([*])=C3[*]',
  'xsmiles_label': '|$R1;;;;;R2;;Q;;R3;;R1;;;;R2;;Q;;R3$|'},
 {'name': 'hydrazine',
  'smiles': '[Q][Q]',
  'code': 'HDZN',
  'xsmiles': '[*][*]',
  'xsmiles_label': '|$Q;Q$|'},
 {'name': "naphtho[1,2-b:5,6-b']dithiophene",
  'smiles': '[Q]C(S1)=C([R1])C2=C1C(C([R3])=C([R2])C3=C4SC([Q])=C3[R1])=C4C([R3])=C2[R2]',
  'code': 'NDTP',
  'xsmiles': '[*]C(S1)=C([*])C2=C1C(C([*])=C([*])C3=C4SC([*])=C3[*])=C4C([*])=C2[*]',
  'xsmiles_label': '|$Q;;;;R1;;;;;R3;;R2;;;;;Q;;R1;;;R3;;R2$|'},
 {'name': "3a,7a-dihydroanthra[2,1,9-def:6,5,10-d'e'f']diisochromene-1,3,8,10-tetraone",
  'smiles': 'O=C([Q]C1=O)C2=C(C1C([R1])=C3[R3])C4=C3C(C([R4])=C([R2])C5C([Q]C6=O)=O)=C(C5=C6C([R1])=C7[R3])C7=C4C([R4])=C2[R2]',
  'code': 'PTCD',
  'xsmiles': 'O=C([*]C1=O)C2=C(C1C([*])=C3[*])C4=C3C(C([*])=C([*])C5C([*]C6=O)=O)=C(C5=C6C([*])=C7[*])C7=C4C([*])=C2[*]',
  'xsmiles_label': '|$;;Q;;;;;;;R1;;R3;;;;;R4;;R2;;;Q;;;;;;;;R1;;R3;;;;R4;;R2$|'},
 {'name': "(E)-4,4'-dimethyl-[6,6'-bithieno[3,2-b]pyrrolylidene]-5,5'(4H,4'H)-dione",
  'smiles': 'O=C1N(C)C2=C(SC([Q])=C2[R1])/C1=C3C(SC([Q])=C4[R1])=C4N(C)C/3=O',
  'code': 'TIDA',
  'xsmiles': 'O=C1N(C)C2=C(SC([*])=C2[*])/C1=C3C(SC([*])=C4[*])=C4N(C)C/3=O',
  'xsmiles_label': '|$;;;;;;;;Q;;R1;;;;;;Q;;R1;;;;;$|'},
 {'name': "indeno[2,1-a]indene",
  'code': 'INDE',
  'smiles': '[R1]C(C([Q])=C1[R2])=C([R3])C2=C1C([R4])=C3C2=C([R4])C4=C([R2])C([Q])=C([R1])C([R3])=C43',
  'xsmiles': '[*]C(C([*])=C1[*])=C([*])C2=C1C([*])=C3C2=C([*])C4=C([*])C([*])=C([*])C([*])=C43',
  'xsmiles_label': '|$R1;;;Q;;R2;;R3;;;;R4;;;;R4;;;R2;;Q;;R1;;R3;$|'},
 {'name': "indeno[1,2-b]fluorene",
  'code': 'INFL',
  'smiles': '[Q]C1=C([R2])C2=C(C([R3])=C1[R1])C3=C([R5])C4=C([R4])C5=C(C([R3])=C([R1])C([Q])=C5[R2])C4=C([R5])C3=C2[R4]',
  'xsmiles': '[*]C1=C([*])C2=C(C([*])=C1[*])C3=C([*])C4=C([*])C5=C(C([*])=C([*])C([*])=C5[*])C4=C([*])C3=C2[*]',
  'xsmiles_label': '|$Q;;;R2;;;;R3;;R1;;;R5;;;R4;;;;R3;;R1;;Q;;R2;;;R5;;;R4$|'}
]

@lipelopesoliveira
Copy link
Owner Author

O código:

from rdkit import Chem
from rdkit.Chem import Draw


NAME = [i['name'] for i in BB_C2]
SMILES = [i['xsmiles'] + ' ' + i['xsmiles_label'] for i in BB_C2]
CODE = [i['code'] for i in BB_C2]


mols = [Chem.MolFromSmiles(smi) for smi in SMILES]

Draw.MolsToGridImage(mols, 
                     molsPerRow=4, 
                     legends=CODE, 
                     subImgSize=(500,250),
                     useSVG=True)

Gera a imagem:

output

@lipelopesoliveira
Copy link
Owner Author

lipelopesoliveira commented Mar 3, 2023

Blocos de construção C3:

smiles_C3 = [
  {'name': 'benzene',
  'code': 'BENZ',
  'smiles': '[Q]C1=C([R1])C([Q])=C([R1])C([Q])=C1[R1]',
  'xsmiles': '[*]C1=C([*])C([*])=C([*])C([*])=C1[*]',
  'xsmiles_label': '|$Q;;;R1;;Q;;R1;;Q;;R1$|'},
 {'name': "5'-phenyl-1,1':3',1''-terphenyl",
  'code': 'TPBZ',
  'smiles': '[Q]C(C([R4])=C1[R5])=C([R3])C([R2])=C1C2=C([R1])C(C3=C([R2])C([R3])=C([Q])C([R4])=C3[R5])=C([R1])C(C4=C([R2])C([R3])=C([Q])C([R4])=C4[R5])=C2[R1]',
  'xsmiles': '[*]C(C([*])=C1[*])=C([*])C([*])=C1C2=C([*])C(C3=C([*])C([*])=C([*])C([*])=C3[*])=C([*])C(C4=C([*])C([*])=C([*])C([*])=C4[*])=C2[*]',
  'xsmiles_label': '|$Q;;;R4;;R5;;R3;;R2;;;;R1;;;;R2;;R3;;Q;;R4;;R5;;R1;;;;R2;;R3;;Q;;R4;;R5;;R1$|'},
 {'name': 'triphenylamine',
  'code': 'TPAM',
  'smiles': '[Q]C(C([R2])=C1)=C([R1])C=C1N(C2=CC([R1])=C([Q])C([R2])=C2)C3=CC([R1])=C([Q])C([R2])=C3',
  'xsmiles': '[*]C(C([*])=C1)=C([*])C=C1N(C2=CC([*])=C([*])C([*])=C2)C3=CC([*])=C([*])C([*])=C3',
  'xsmiles_label': '|$Q;;;R2;;;R1;;;;;;;R1;;Q;;R2;;;;;R1;;Q;;R2;$|'},
 {'name': "10,15-dihydro-5H-diindolo[3,2-a:3',2'-c]carbazole",
  'code': 'DICZ',
  'smiles': '[Q]C1=C([R1])C2=C(C3=C(N2)C(C4=C([R3])C([R2])=C([Q])C([R1])=C4N5)=C5C6=C3NC7=C([R1])C([Q])=C([R2])C([R3])=C76)C([R3])=C1[R2]',
  'xsmiles': '[*]C1=C([*])C2=C(C3=C(N2)C(C4=C([*])C([*])=C([*])C([*])=C4N5)=C5C6=C3NC7=C([*])C([*])=C([*])C([*])=C76)C([*])=C1[*]',
  'xsmiles_label': '|$Q;;;R1;;;;;;;;;R3;;R2;;Q;;R1;;;;;;;;;R1;;Q;;R2;;R3;;;R3;;R2$|'},
 {'name': 'triphenylene',
  'code': 'TPNY',
  'smiles': '[Q]C1=C([R1])C2=C(C([R2])=C1)C3=C(C([R2])=CC([Q])=C3[R1])C4=C2C([R2])=CC([Q])=C4[R1]',
  'xsmiles': '[*]C1=C([*])C2=C(C([*])=C1)C3=C(C([*])=CC([*])=C3[*])C4=C2C([*])=CC([*])=C4[*]',
  'xsmiles_label': '|$Q;;;R1;;;;R2;;;;;R2;;;Q;;R1;;;;R2;;;Q;;R1$|'},
 {'name': '2,4,6-triphenoxy-1,3,5-triazine',
  'code': 'TPOB',
  'smiles': '[Q]C(C([R2])=C1)=C([R1])C=C1OC2=NC(OC3=CC([R2])=C([Q])C([R1])=C3)=NC(OC4=CC([R2])=C([Q])C([R1])=C4)=N2',
  'xsmiles': '[*]C(C([*])=C1)=C([*])C=C1OC2=NC(OC3=CC([*])=C([*])C([*])=C3)=NC(OC4=CC([*])=C([*])C([*])=C4)=N2',
  'xsmiles_label': '|$Q;;;R2;;;R1;;;;;;;;;;;R2;;Q;;R1;;;;;;;;R2;;Q;;R1;;$|'},
 {'name': '1,3,5-triphenoxybenzene',
  'code': 'TPTA',
  'smiles': '[Q]C(C([R2])=C1)=C([R1])C=C1OC2=C([R3])C(OC3=CC([R2])=C([Q])C([R1])=C3)=C([R3])C(OC4=CC([R2])=C([Q])C([R1])=C4)=C2[R3]',
  'xsmiles': '[*]C(C([*])=C1)=C([*])C=C1OC2=C([*])C(OC3=CC([*])=C([*])C([*])=C3)=C([*])C(OC4=CC([*])=C([*])C([*])=C4)=C2[*]',
  'xsmiles_label': '|$Q;;;R2;;;R1;;;;;;R3;;;;;;R2;;Q;;R1;;;R3;;;;;;R2;;Q;;R1;;;R3$|'},
 {'name': '2,4,6-triphenyl-1,3,5-triazine',
  'code': 'TPTZ',
  'smiles': '[Q]C(C([R3])=C1[R4])=C([R2])C([R1])=C1C2=NC(C3=C([R1])C([R2])=C([Q])C([R3])=C3[R4])=NC(C4=C([R1])C([R2])=C([Q])C([R3])=C4[R4])=N2',
  'xsmiles': '[*]C(C([*])=C1[*])=C([*])C([*])=C1C2=NC(C3=C([*])C([*])=C([*])C([*])=C3[*])=NC(C4=C([*])C([*])=C([*])C([*])=C4[*])=N2',
  'xsmiles_label': '|$Q;;;R3;;R4;;R2;;R1;;;;;;;R1;;R2;;Q;;R3;;R4;;;;;R1;;R2;;Q;;R3;;R4;$|'},
 {'name': 'None',
  'code': 'DBA1',
  'smiles': '[Q]C1=C([R3])C(C#CC2=C3C([R3])=C([Q])C([R2])=C2[R1])=C(C#CC(C([R3])=C([Q])C([R2])=C4[R1])=C4C#C3)C([R1])=C1[R2]',
  'xsmiles': '[*]C1=C([*])C(C#CC2=C3C([*])=C([*])C([*])=C2[*])=C(C#CC(C([*])=C([*])C([*])=C4[*])=C4C#C3)C([*])=C1[*]',
  'xsmiles_label': '|$Q;;;R3;;;;;;;R3;;Q;;R2;;R1;;;;;;R3;;Q;;R2;;R1;;;;;R1;;R2$|'},
 {'name': 'None',
  'code': 'DBA2',
  'smiles': '[Q]C1=C([R3])C(C#CC#CC2=C3C([R3])=C([Q])C([R2])=C2[R1])=C(C#CC#CC4=C([R3])C([Q])=C([R2])C([R1])=C4C#CC#C3)C([R1])=C1[R2]',
  'xsmiles': '[*]C1=C([*])C(C#CC#CC2=C3C([*])=C([*])C([*])=C2[*])=C(C#CC#CC4=C([*])C([*])=C([*])C([*])=C4C#CC#C3)C([*])=C1[*]',
  'xsmiles_label': '|$Q;;;R3;;;;;;;;;R3;;Q;;R2;;R1;;;;;;;;R3;;Q;;R2;;R1;;;;;;;R1;;R2$|'},
 {'name': '13,14-dihydro-1,2,3(3,6)-triphenanthrenacyclopropaphane',
  'code': 'STAR',
  'smiles': '[Q]C1=C([R1])C2=C(C3=C1C([R2])=C([R3])C(C4=C([R5])C(C(C([R6])C(C5=C([R5])C(C(C([R6])=C6C([R3])=C7[R2])=C7C([Q])=C8[R1])=C8C([R5])=C5[R4])C([R3])=C9[R2])=C9C([Q])=C%10[R1])=C%10C([R5])=C4[R4])=C3[R6])C([R5])=C6C([R4])=C2[R5]',
  'xsmiles': '[*]C1=C([*])C2=C(C3=C1C([*])=C([*])C(C4=C([*])C(C(C([*])C(C5=C([*])C(C(C([*])=C6C([*])=C7[*])=C7C([*])=C8[*])=C8C([*])=C5[*])C([*])=C9[*])=C9C([*])=C%10[*])=C%10C([*])=C4[*])=C3[*])C([*])=C6C([*])=C2[*]',
  'xsmiles_label': '|$Q;;;R1;;;;;;R2;;R3;;;;R5;;;;R6;;;;R5;;;;R6;;;R3;;R2;;;Q;;R1;;;R5;;R4;;R3;;R2;;;Q;;R1;;;R5;;R4;;R6;;R5;;;R4;;R5$|'},
 {'name': 'None',
  'code': 'STAR1',
  'smiles': '[Q]C1=C([R1])C2=C(C3=C1C([R2])=C([R3])C(C#CC4=C([R5])C(C(C([R6])C(C#C5)C([R3])=C6[R2])=C6C([Q])=C7[R1])=C7C([R5])=C4[R4])=C3[R6])C([R5])=C(C#CC(C([R3])=C8[R2])=C([R5])C9=C8C([Q])=C([R1])C%10=C9C([R5])=C5C([R4])=C%10[R5])C([R4])=C2[R5]',
  'xsmiles': '[*]C1=C([*])C2=C(C3=C1C([*])=C([*])C(C#CC4=C([*])C(C(C([*])C(C#C5)C([*])=C6[*])=C6C([*])=C7[*])=C7C([*])=C4[*])=C3[*])C([*])=C(C#CC(C([*])=C8[*])=C([*])C9=C8C([*])=C([*])C%10=C9C([*])=C5C([*])=C%10[*])C([*])=C2[*]',
  'xsmiles_label': '|$Q;;;R1;;;;;;R2;;R3;;;;;;R5;;;;R6;;;;;R3;;R2;;;Q;;R1;;;R5;;R4;;R6;;R5;;;;;;R3;;R2;;R5;;;;Q;;R1;;;;R5;;;R4;;R5;;R4;;R5$|'},
  {'name': "benzo[1,2-b:3,4-b':5,6-b'']trithiophene",
   'code': 'BTTP',
   'smiles': '[Q]C1=C([R1])C2=C(S1)C(C([R1])=C([Q])S3)=C3C4=C2SC([Q])=C4[R1]',
   'xsmiles': '[*]C1=C([*])C2=C(S1)C(C([*])=C([*])S3)=C3C4=C2SC([*])=C4[*]',
   'xsmiles_label': '|$Q;;;R1;;;;;;R1;;Q;;;;;;;Q;;R1$|'},
  {'name': "5''-([1,1'-biphenyl]-4-yl)-1,1':4',1'':3'',1''':4''',1''''-quinquephenyl",
   'code': 'TBBZ',
   'smiles': '[Q]C(C([R2])=C1[R4])=C([R1])C([R3])=C1C(C([R6])=C2[R8])=C([R5])C([R7])=C2C3=C([R9])C(C4=C([R7])C([R5])=C(C5=C([R3])C([R1])=C([Q])C([R2])=C5[R4])C([R6])=C4[R8])=C([R9])C(C6=C([R7])C([R5])=C(C7=C([R3])C([R1])=C([Q])C([R2])=C7[R4])C([R6])=C6[R8])=C3[R9]',
   'xsmiles': '[*]C(C([*])=C1[*])=C([*])C([*])=C1C(C([*])=C2[*])=C([*])C([*])=C2C3=C([*])C(C4=C([*])C([*])=C(C5=C([*])C([*])=C([*])C([*])=C5[*])C([*])=C4[*])=C([*])C(C6=C([*])C([*])=C(C7=C([*])C([*])=C([*])C([*])=C7[*])C([*])=C6[*])=C3[*]',
   'xsmiles_label': '|$Q;;;R2;;R4;;R1;;R3;;;;R6;;R8;;R5;;R7;;;;R9;;;;R7;;R5;;;;R3;;R1;;Q;;R2;;R4;;R6;;R8;;R9;;;;R7;;R5;;;;R3;;R1;;Q;;R2;;R4;;R6;;R8;;R9$|'},
  {'name': "1,3,5-triazine",
   'code': 'TRZN',
   'smiles': '[Q]C1=NC([Q])=NC([Q])=N1',
   'xsmiles': '[*]C1=NC([*])=NC([*])=N1',
   'xsmiles_label': '|$Q;;;;Q;;;Q;$|'}
  ]

@lipelopesoliveira
Copy link
Owner Author

lipelopesoliveira commented Mar 3, 2023

Grupos R:

smiles_R = [
  {'name': 'hydrogen',
  'code': 'H',
  'smiles': '[R][H]',
  'xsmiles': '[*][H]',
  'xsmiles_label': '|$R;$|'},
 {'name': 'hydroxyl',
  'code': 'OH',
  'smiles': '[R]O',
  'xsmiles': '[*]O',
  'xsmiles_label': '|$R;$|'},
 {'name': 'methyl',
  'code': 'CH3',
  'smiles': '[R]C',
  'xsmiles': '[*]C',
  'xsmiles_label': '|$R;$|'},
 {'name': 'tert-butyl',
  'code': 'tBu',
  'smiles': '[R]C(C)(C)C',
  'xsmiles': '[*]C(C)(C)C',
  'xsmiles_label': '|$R;;;;$|'},
 {'name': 'methoxy',
  'code': 'OMe',
  'smiles': '[R]OC',
  'xsmiles': '[*]OC',
  'xsmiles_label': '|$R;;$|'},
 {'name': 'ethoxy',
  'code': 'OEt',
  'smiles': '[R]OCC',
  'xsmiles': '[*]OCC',
  'xsmiles_label': '|$R;;;$|'},
 {'name': 'amine',
  'code': 'NH2',
  'smiles': '[R]N',
  'xsmiles': '[*]N',
  'xsmiles_label': '|$R;$|'},
 {'name': 'nitro',
  'code': 'NO2',
  'smiles': '[R][N+]([O-])=O',
  'xsmiles': '[*][N+]([O-])=O',
  'xsmiles_label': '|$R;;;$|'},
 {'name': 'cyano',
  'code': 'CN',
  'smiles': '[R]C#N',
  'xsmiles': '[*]C#N',
  'xsmiles_label': '|$R;;$|'},
 {'name': 'formyl',
  'code': 'CHO',
  'smiles': '[R]C([H])=O',
  'xsmiles': '[*]C([H])=O',
  'xsmiles_label': '|$R;;;$|'},
 {'name': 'carboxy',
  'code': 'COOH',
  'smiles': '[R]C(O)=O',
  'xsmiles': '[*]C(O)=O',
  'xsmiles_label': '|$R;;;$|'},
 {'name': 'acetoxy',
  'code': 'OCOCH3',
  'smiles': '[R]OC(C)=O',
  'xsmiles': '[*]OC(C)=O',
  'xsmiles_label': '|$R;;;;$|'},
 {'name': 'thyol',
  'code': 'SH',
  'smiles': '[R]S',
  'xsmiles': '[*]S',
  'xsmiles_label': '|$R;$|'},
 {'name': 'keto',
  'code': 'O',
  'smiles': '[R]=O',
  'xsmiles': '[*]=O',
  'xsmiles_label': '|$R;$|'},
 {'name': 'nitroso',
  'code': 'NO',
  'smiles': '[R]N=O',
  'xsmiles': '[*]N=O',
  'xsmiles_label': '|$R;;$|'},
 {'name': 'fluorine',
  'code': 'F',
  'smiles': '[R]F',
  'xsmiles': '[*]F',
  'xsmiles_label': '|$R$|'},
 {'name': 'chlorine',
  'code': 'Cl',
  'smiles': '[R]Cl',
  'xsmiles': '[*]Cl',
  'xsmiles_label': '|$R;$|'},
 {'name': 'bromine',
  'code': 'Br',
  'smiles': '[R]Br',
  'xsmiles': '[*]Br',
  'xsmiles_label': '|$R$|'},
 {'name': 'iodine',
  'code': 'I',
  'smiles': '[R]I',
  'xsmiles': '[*]I',
  'xsmiles_label': '|$R$|'},
 {'name': 'sulfinic acid',
  'code': 'SO2H',
  'smiles': '[R]S(O)=O',
  'xsmiles': '[*]S(O)=O',
  'xsmiles_label': '|$R;;;$|'},
 {'name': 'sulfonic acid',
  'code': 'SO3H',
  'smiles': '[R]S(=O)(O)=O',
  'xsmiles': '[*]S(=O)(O)=O',
  'xsmiles_label': '|$R;;;;$|'},
 {'name': 'thial',
  'code': 'CHS',
  'smiles': '[R]C([H])=S',
  'xsmiles': '[*]C([H])=S',
  'xsmiles_label': '|$R;;;$|'},
 {'name': 'epoxide',
  'code': 'EPO',
  'smiles': '[R]C1CO1',
  'xsmiles': '[*]C1CO1',
  'xsmiles_label': '|$R;;;$|'},
 {'name': 'methyl epoxide',
  'code': 'MEPO',
  'smiles': '[R]CC1CO1',
  'xsmiles': '[*]CC1CO1',
  'xsmiles_label': '|$R;;;;$|'},
 {'name': 'ethyl epoxide',
  'code': 'EEPO',
  'smiles': '[R]CCC1CO1',
  'xsmiles': '[*]CCC1CO1',
  'xsmiles_label': '|$R;;;;;$|'},
 {'name': 'ethoxymethyl epoxide',
  'code': 'EMEPO',
  'smiles': '[R]COCC1CO1',
  'xsmiles': '[*]COCC1CO1',
  'xsmiles_label': '|$R;;;;;;$|'},
 {'name': 'oxiethyl epoxide',
  'code': 'OEEPO',
  'smiles': '[R]OCC1CO1',
  'xsmiles': '[*]OCC1CO1',
  'xsmiles_label': '|$R;;;;;$|'},
  {'name': 'benzene',
   'code': 'Ph',
   'smiles': '[R]C1=CC=CC=C1',
   'xsmiles': '[*]C1=CC=CC=C1',
   'xsmiles_label': '|$R;;;;;;$|'}
  ]

@lipelopesoliveira
Copy link
Owner Author

lipelopesoliveira commented Mar 3, 2023

Grupos Q:

smiles_Q = [
 {'name': 'amine',
  'code': 'NH2',
  'smiles': '[Q]N',
  'xsmiles': '[*]N',
  'xsmiles_label': '|$Q;$|'},
 {'name': 'aldehyde',
  'code': 'CHO',
  'smiles': '[Q]C([H])=O',
  'xsmiles': '[*]C([H])=O',
  'xsmiles_label': '|$Q;;;$|'},
 {'name': 'boronic acid',
  'code': 'BOH2',
  'smiles': '[Q]B(O)O',
  'xsmiles': '[*]B(O)O',
  'xsmiles_label': '|$Q;;$|'},
 {'name': 'acetohydrazide',
  'code': 'CONHNH2',
  'smiles': '[Q]C(NN)=O',
  'xsmiles': '[*]C(NN)=O',
  'xsmiles_label': '|$Q;;;;$|'},
 {'name': 'methylhydrazine',
  'code': 'NHNH2',
  'smiles': '[Q]NN',
  'xsmiles': '[*]NN',
  'xsmiles_label': '|$Q;;$|'},
 {'name': 'nitrile',
  'code': 'CN',
  'smiles': '[Q]C#N',
  'xsmiles': '[*]C#N',
  'xsmiles_label': '|$Q;;$|'},
 {'name': 'bromine',
  'code': 'Br',
  'smiles': '[Q]Br',
  'xsmiles': '[*]Br',
  'xsmiles_label': '|$Q;$|'},
 {'name': 'chlorine',
  'code': 'Cl',
  'smiles': '[Q]Cl',
  'xsmiles': '[*]Cl',
  'xsmiles_label': '|$Q;$|'},
 {'name': 'oxigem',
  'code': 'O',
  'smiles': '[Q]O',
  'xsmiles': '[*]O',
  'xsmiles_label': '|$Q;$|'},
 {'name': 'dihydroxy',
   'code': 'OH2',
   'smiles': '[Q]O[B]O1',
   'xsmiles': '[*]O[B]O1',
   'xsmiles_label': '|$Q;;B;$|'},
]

@lipelopesoliveira
Copy link
Owner Author

B2
B3
Q
R

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📶 enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant