Decode more than one molecule from one latent #73

cankobanz · 2024-02-21T12:44:28Z

Hello, thank you for sharing your work.

I have a question regarding the decoding process. I'm curious about how to decode more than one molecule from a specific latent space, especially those that are neighbors with high log likelihood relative to the returned molecule.

Currently, it seems that for the same latent input, the decoded output remains unchanged, and the sampling process doesn't seem to support starting from a specified latent position:

molecule-generation/molecule_generation/wrapper.py

Lines 156 to 165 in 48d532f

    
               def sample(self, num_samples: int) -> List[str]: 
        
                   """Sample SMILES strings from the model. 
        
                   Args: 
        
                       num_samples: Number of samples to return. 
        
                   Returns: 
        
                       List of SMILES strings. 
        
                   """ 
        
                   return self.decode(self.sample_latents(num_samples))

I was considering customizing the sample method of the GeneratorWrapper to initiate from a specific latent point instead of starting from zeros. However, the provided checkpoint (GNN_Edge_MLP_MoLeR__2022-02-24_07-16-23_best.pkl) is configured for the VaeWrapper, not the GeneratorWrapper.

Note: I have taken into consideration your suggestion to add small noise to address this issue, as discussed in issue 40. However, my primary interest lies in exploring a more refined solution, specifically through adjusting the num_samples parameter here:

molecule-generation/molecule_generation/utils/moler_decoding_utils.py

Lines 64 to 72 in 48d532f

    
           num_samples = min(num_samples, num_choices)  # Handle cases where we only have few candidates 
        
           if sampling_mode == DecoderSamplingMode.GREEDY: 
        
               # Note that this will return the top num_samples indices, but not in order: 
        
               picked_indices = np.argpartition(logprobs, -num_samples)[-num_samples:] 
        
           elif sampling_mode == DecoderSamplingMode.SAMPLING: 
        
               p = np.exp(logprobs)  # Convert to probabilities 
        
               # We can only sample values with non-zero probabilities 
        
               num_choices = np.sum(p > 0) 
        
               num_samples = min(num_samples, num_choices)

Thank you in advance for your assistance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decode more than one molecule from one latent #73

Decode more than one molecule from one latent #73

cankobanz commented Feb 21, 2024

Decode more than one molecule from one latent #73

Decode more than one molecule from one latent #73

Comments

cankobanz commented Feb 21, 2024