Is there a way to generate different molecular through the same scaffold? #74

whecrane · 2024-05-29T06:16:44Z

Hello,
Thank you for your nice job. I performed the MoLeR used your example successfully. When I used the sample parameter and the decoded with scaffolds, the sampled molecules are not like my scaffold. I wonder to know if I can get different molecular through the same scaffold.

Best

with load_model_from_directory(model_dir) as model:
embeddings = model.encode(example_smiles)
print(f"Embedding shape: {embeddings[0].shape}")
decoded = model.decode(embeddings)
decoded_scaffolds = model.decode(embeddings, scaffolds=["CC1C(NCC=O)=O)=O"])
sample=model.sample(10)
print(f"Encoded: {example_smiles}")
print(f"Decoded with scaffolds: {decoded_scaffolds}")
print(f"Sample:{sample}")

kmaziarz · 2024-05-29T10:46:31Z

If you call sample, then you will get samples from the prior without considering any scaffold. If you want random molecules conditioned on a scaffold, you'd have to prepare embeddings that are reasonably close to having the scaffold and then decode those embeddings with the scaffold constraint via decode.

To get embeddings to decode, you could e.g. embed one molecule that has the scaffold and perturb its embeddings randomly, or you could even embed many molecules that have the scaffold and fit a mixture model to those embeddings and then sample from it. Finally, you could even take fully random embeddings and decode them with the scaffold constraint, but that may lead to low-quality results as the model may be confused if there is a large mismatch between what the embedding would decode to without the constraint vs with.

whecrane · 2024-05-30T01:38:39Z

Thank you for your advice, I think to perturb the embedding randomly will be OK. Thanks

whecrane · 2024-05-31T06:37:36Z

Hi,
I followed your advice and add some noise. When I add the parameter 'scaffolds', the decoded always the same. Here is my code:
with load_model_from_directory(model_dir) as model:
embeddings = model.encode(example_smiles)
print(f"Embedding shape: {embeddings[0].shape}")
noise = np.random.normal(0, 0.5, embeddings[0].shape)
noise = noise.astype(embeddings[0].dtype)
noise_expand=np.expand_dims(noise,axis=0)
noise_embedding = embeddings[0] + noise_expand
decoded = model.decode(noise_embedding, scaffolds=["CCC"])
print(f"Decoded:{decoded}")
I want to know how to use the scaffolds rightly.

Best

kmaziarz · 2024-06-03T12:30:18Z

What do you mean always the same, between executions of your script? I imagine the script may be deterministic because MoLeR code sets random seeds for various libraries like numpy. When I draw several random vectors I get varying results:

>>> noise = np.random.normal(0, 0.5, (5, embeddings[0].shape[-1]))
>>> noise = noise.astype(embeddings[0].dtype)
>>> noise_embedding = embeddings[0] + noise
>>> print(noise_embedding.shape)
(5, 512)
>>> model.decode(noise_embedding, scaffolds=["CCC"] * len(noise_embedding))
['CCC(C1=CC=CC=C1)C1=CC=CC=C1', 'CC(C)C1=CC=CC=C1', 'CCCC1=CC=CC=C1', 'CCC(C1=CC=CC=C1)C1=CC=CC=C1', 'CC(C)C1=CC=CC=C1']

whecrane · 2024-06-04T02:24:56Z

Thank you very much for your explanation, it works perfectly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way to generate different molecular through the same scaffold? #74

Is there a way to generate different molecular through the same scaffold? #74

whecrane commented May 29, 2024

kmaziarz commented May 29, 2024

whecrane commented May 30, 2024

whecrane commented May 31, 2024

kmaziarz commented Jun 3, 2024

whecrane commented Jun 4, 2024

Is there a way to generate different molecular through the same scaffold? #74

Is there a way to generate different molecular through the same scaffold? #74

Comments

whecrane commented May 29, 2024

kmaziarz commented May 29, 2024

whecrane commented May 30, 2024

whecrane commented May 31, 2024

kmaziarz commented Jun 3, 2024

whecrane commented Jun 4, 2024