Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When running,I have an error #9

Open
gaojunhui68 opened this issue Nov 29, 2017 · 16 comments
Open

When running,I have an error #9

gaojunhui68 opened this issue Nov 29, 2017 · 16 comments

Comments

@gaojunhui68
Copy link

When running,I have an error:

Traceback (most recent call last):
File "example.py", line 9, in
model.train(ckpt_dir='ckpt')
File "D:\mypy\ORGAN-master\organ_init_.py", line 763, in train
gen_samples, self.train_samples, self.ord_dict, results)
File "D:\mypy\ORGAN-master\organ\mol_metrics.py", line 185, in compute_results
results[objective] = np.mean(reward(verified_samples, train_data))
File "D:\mypy\ORGAN-master\organ_init_.py", line 743, in batch_reward
for sample in samples]
File "D:\mypy\ORGAN-master\organ_init_.py", line 743, in
for sample in samples]
File "D:\mypy\ORGAN-master\organ\mol_metrics.py", line 117, in decode
''.join([ord_dict[o] for o in ords]))
File "D:\mypy\ORGAN-master\organ\mol_metrics.py", line 117, in
''.join([ord_dict[o] for o in ords]))
KeyError: 'O'

@gaojunhui68
Copy link
Author

Traceback (most recent call last):
File "example.py", line 18, in
model.train() # Proceeds with the training
File "D:\mypy\ORGAN-master\organ_init_.py", line 763, in train
gen_samples, self.train_samples, self.ord_dict, results)
File "D:\mypy\ORGAN-master\organ\music_metrics.py", line 252, in compute_results
results[key] = np.mean(reward(samples, train_samples))
File "D:\mypy\ORGAN-master\organ_init_.py", line 743, in batch_reward
for sample in samples]
File "D:\mypy\ORGAN-master\organ_init_.py", line 743, in
for sample in samples]
File "D:\mypy\ORGAN-master\organ\music_metrics.py", line 60, in decode
return ' '.join(unpad([ord_dict[o] for o in ords]))
File "D:\mypy\ORGAN-master\organ\music_metrics.py", line 60, in
return ' '.join(unpad([ord_dict[o] for o in ords]))
KeyError: 'b'

@couteiral
Copy link
Collaborator

Hi @gaojunhui68,

Both errors come (obviously) from different sources. Particularly, the first is using the molecular metrics, and the second is using the music metrics.

In both cases, it looks like you are using different training sets in the run and the checkpoint, so the engine is unable to decode (because the internal dictionary does not recognize the features. If you give me more information (i. e., the actual file that you run), I'll be able to give you more information.

Regards,
Carlos

@gaojunhui68
Copy link
Author

Hi @couteiral,

Yes, the first is using the molecular metrics, and the second is using the music metrics.

For the first, the code of example.py is in bellow:

import organ
from organ import ORGAN
model = ORGAN('test', 'mol_metrics', params={'PRETRAIN_DIS_EPOCHS': 1})
model.load_training_set('data/toy.csv')
model.set_training_program(['novelty'], [1])
model.load_metrics()
model.train(ckpt_dir='ckpt')

For the second , the code of example.py is in bellow:

from organ import ORGAN

model = ORGAN('test', 'music_metrics') # Loads a ORGANIC with name 'test', using music metrics
model.load_training_set('data/music_small.txt') # Loads the training set
model.set_training_program(['tonality'], [50]) # Sets the training program as 50 epochs with the tonality metric
model.load_metrics() # Loads all the metrics
model.train() # Proceeds with the training

In both cases ,this error occurs in train, after pretrain.

Please help me .
Thanks,
Junhui Gao

@couteiral
Copy link
Collaborator

Hi @gaojunhui68,

First, the music metrics seem to be bugged. I am afraid I didn't work on them myself, but I'll get in touch with someone involved, and get back to you.

Regarding the molecular metrics, you get a KeyError, which is the error that a Python dictionary raises when a key not in the dictionary is requested. The following is happening: when you try to decode the embedding coordinates to SMILES strings, you are passing the wrong value to the dictionary, and the code crashes.

In particular, you are passing 'O' to the ord_dict, which is the dictionary containing the mapping from the embedding to the SMILES strings, so something is wrong in there. However, I just ran exactly the same code from the actual repo, and I could not found any problem like yours.

Could you share your pretraining files, so I can have a look at them? Also, are you sure that there is nothing wrong with your 'toy.csv' training set?

Cheers,
Carlos

@n-yoshikawa
Copy link

Hi @couteiral,

I had the same error as @gaojunhui68. What I did was:

git clone https://github.com/gablg1/ORGAN.git
pip install -r requirements.txt
python example.py

The error message was

Traceback (most recent call last):
  File "example.py", line 8, in <module>
    model.train(ckpt_dir='ckpt')
  File "/home/yoshikawa/ORGAN/organ/__init__.py", line 763, in train
    gen_samples, self.train_samples, self.ord_dict, results)
  File "/home/yoshikawa/ORGAN/organ/mol_metrics.py", line 183, in compute_results
    results[objective] = np.mean(reward(verified_samples, train_data))
  File "/home/yoshikawa/ORGAN/organ/__init__.py", line 743, in batch_reward
    for sample in samples]
  File "/home/yoshikawa/ORGAN/organ/mol_metrics.py", line 115, in decode
    ''.join([ord_dict[o] for o in ords]))
KeyError: 'O'

I used pyenv. Both anaconda3-5.0.0 and anaconda2-5.0.0 did not work.

@toushi68
Copy link

toushi68 commented Apr 9, 2018

Hi @couteiral,

Any update for the previous posts? The error message I got is KeyError: 'N'

I was also running the example.py. It happened at "model.train(ckpt_dir='ckpt')", after finished the pre-training. It looks like it was happened at the same spot like the KeyError: 'O'.

Traceback (most recent call last):
File "", line 1, in
File "/home/trial0/ORGAN/organ/init.py", line 763, in train
gen_samples, self.train_samples, self.ord_dict, results)
File "/home/trial0/ORGAN/organ/mol_metrics.py", line 183, in compute_results
results[objective] = np.mean(reward(verified_samples, train_data))
File "/home/trial0/ORGAN/organ/init.py", line 743, in batch_reward
for sample in samples]
File "/home/trial0/ORGAN/organ/init.py", line 743, in
for sample in samples]
File "/home/trial0/ORGAN/organ/mol_metrics.py", line 115, in decode
''.join([ord_dict[o] for o in ords]))
File "/home/trial0/ORGAN/organ/mol_metrics.py", line 115, in
''.join([ord_dict[o] for o in ords]))
KeyError: 'N'

Is there anyway to pin point which smiles made the problem?
Please let me know if you need any further details.

Thanks in advance!
Toushi

@toushi68
Copy link

It looks like this issue is related to the data set. For the toy set, I found there are > 30 entries with empty smiles, i.e. have NumAtom, Name, but the smiles column are empty. From there I further refined the data set with rdkit. With all these trials, I got different KeyError(s), 'C', '[', 'O'. This means the data set still has something wrong! Or a filter is needed before processing the data just like "ORGANIC" does.

@yippp
Copy link

yippp commented Apr 19, 2018

Hi @couteiral,
I found that the error is due to the function mm.decode(ords, rod_dict). The ords may be string or list. Thats why it will occurs an error.
I think that there should be 2 different decode() functions.

Could you run the code use your music_small.txt dataset again to help to fix it?
Thanks.

@toushi68
Copy link

toushi68 commented May 1, 2018

Hi @couterial,

Have a look of some simple debug.
If I insert a print in the decode, like this:

def decode(ords, ord_dict):
print (ords) # check
return unpad(''.join([ord_dict[o] for o in ords]))

Here are the last few lines printed out before it crashes:
.........
[11 1 2 1 1 2 1 1 2 1 4 11 7 21 21 21]
[ 1 2 11 1 1 9 11 10 2 11 21 21 21 21 21 21]
[ 1 2 1 1 9 8 10 8 21 21 21 21 21 21 21 21]
[11 2 1 1 1 3 4 3 9 7 8 10 6 4 21 21]
[ 8 1 7 3 4 5 5 3 5 6 4 21 21 21 21 21]
N#CC(O)F
Traceback (most recent call last):
File "/home/trial3/ORGAN/organ/init.py", line 763, in train
gen_samples, self.train_samples, self.ord_dict, results)
File "/home/trial3/ORGAN/organ/mol_metrics.py", line 185, in compute_results
results[objective] = np.mean(reward(verified_samples, train_data))
File "/home/trial3/ORGAN/organ/init.py", line 743, in batch_reward
for sample in samples]
File "/home/trial3/ORGAN/organ/mol_metrics.py", line 117, in decode
return unpad(''.join([ord_dict[o] for o in ords]))
KeyError: 'N'

It looks like this is an already decoded smile, which should not be sent back to decode again.
Any idea what's going on?
Thanks!
Toushi68

@k105la
Copy link

k105la commented May 22, 2018

Hello @couteiral

I ran my code line by line to see where the problem occurs. I see that after importing the dataset with model.load_training_set('data/toy.csv') the error comes up.

Traceback (most recent call last):
File "", line 1, in
File "/Users/akilhylton/Desktop/ORGAN/organ/init.py", line 242, in load_training_set
self.char_dict) for sam in to_use]
File "/Users/akilhylton/Desktop/ORGAN/organ/init.py", line 242, in
self.char_dict) for sam in to_use]
File "/Users/akilhylton/Desktop/ORGAN/organ/mol_metrics.py", line 383, in encode
return [char_dict[c] for c in pad(new_smi, max_len)]
File "/Users/akilhylton/Desktop/ORGAN/organ/mol_metrics.py", line 383, in
return [char_dict[c] for c in pad(new_smi, max_len)]
KeyError: '.'

Any solutions?

@Kajiyu
Copy link

Kajiyu commented Jun 18, 2018

I have the same error with @ahylton19 .

Traceback (most recent call last):
File "example.py", line 5, in
model.load_training_set('data/toy.csv')
File "/home/yuma_kajihara/projects/ORGAN/organ/init.py", line 242, in load_training_set
self.char_dict) for sam in to_use]
File "/home/yuma_kajihara/projects/ORGAN/organ/init.py", line 242, in
self.char_dict) for sam in to_use]
File "/home/yuma_kajihara/projects/ORGAN/organ/mol_metrics.py", line 384, in encode
return [char_dict[c] for c in pad(new_smi, max_len)]
File "/home/yuma_kajihara/projects/ORGAN/organ/mol_metrics.py", line 384, in
return [char_dict[c] for c in pad(new_smi, max_len)]
KeyError: '.'

@xuzhang5788
Copy link

I have the same error with @Kajiyu
Traceback (most recent call last):
File "example.py", line 5, in
model.load_training_set('data/toy.csv')
File "/media/projects/ORGAN/organ/init.py", line 242, in load_training_set
self.char_dict) for sam in to_use]
File "/media/projects/ORGAN/organ/init.py", line 242, in
self.char_dict) for sam in to_use]
File "/media/projects/ORGAN/organ/mol_metrics.py", line 383, in encode
return [char_dict[c] for c in pad(new_smi, max_len)]
File "/media/projects/ORGAN/organ/mol_metrics.py", line 383, in
return [char_dict[c] for c in pad(new_smi, max_len)]
KeyError: '.'

Any solutions?

@beangoben
Copy link
Collaborator

we will be updating the repo soon, we expect these changes to be incorporated by monday, latest tuesday...stay tuned, they will fix these issues.

@kristery
Copy link

kristery commented Dec 4, 2018

I have the same error with @xuzhang5788
Seems they are not going to update it?

@kristery
Copy link

kristery commented Dec 4, 2018

I checked the table of SMILES and it seems that '.' represents one kind of bond and in the mol_metrics.py file they didn't add it. To fix the error please modify line 315 to

chars = chars + ['-', '=', '#', '.']

I guess it works as long as you add '.' to chars.

@xueyuanyuan0410
Copy link

When running,I have an error:
Traceback (most recent call last):
File "example.py", line 8, in
model.train(ckpt_dir='ckpt')
File "/home/zy/ORGAN/ORGAN-master/organ/init.py", line 745, in train
self.pretrain()
File "/home/zy/ORGAN/ORGAN-master/organ/init.py", line 670, in pretrain
_, g_loss, g_pred = self.generator.pretrain_step(self.sess,batch)
File "/home/zy/ORGAN/ORGAN-master/organ/generator.py", line 210, in pretrain_step
outputs = session.run([self.pretrain_updates,self.pretrain_loss,self.g_predictions],
File "/home/zy/.conda/envs/zy_2/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 967, in run
result = self._run(None, fetches, feed_dict, options_ptr,
File "/home/zy/.conda/envs/zy_2/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1166, in _run
np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)#np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)
File "/home/zy/.local/lib/python3.8/site-packages/numpy/core/_asarray.py", line 83, in asarray
return array(a, dtype, copy=False, order=order)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
Any solutions?
Thank you for your answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests