unsupported character train as [unk] token #21

WongVi · 2022-08-12T01:49:57Z

@baudm Please help with some questions to solve

Could you please let me know how can I train characters rather than the character set as an unknown character so that network will be able to predict [unk] token for an unsupported character for demo purpose?
when I train your module using the Japanese language after some iteration training is killed automatically. how can I train until some epoch without early killing?

baudm · 2022-08-12T03:18:19Z

1. Could you please let me know how can I train characters rather than the character set as an unknown character so that network will be able to predict [unk] token for an unsupported character for demo purpose?

See #9.

2. when I train your module using the Japanese language after some iteration training is killed automatically. how can I train until some epoch without early killing?

If your process is being killed, your machine might be running out of memory.

WongVi · 2022-08-12T05:05:36Z

@baudm I checked the issue but there is no explanation about the unknown token.
I checked your code so I found that you are just ignoring the unknown character instead of making a token. without ignore how can I handled it?

baudm · 2022-08-12T05:28:24Z

@WongVi

Designate an unknown token, e.g. [U] and add it to Tokenizer as specials_first

parseq/strhub/data/utils.py

Line 108 in 048f0bb

specials_first = (self.EOS,)
Modify the following and replace blank string with '[U]'

parseq/strhub/data/utils.py

Line 41 in 048f0bb

label = re.sub(self.unsupported, '', label)
Modify _tok2ids() to support the conversion of '[U]' to the corresponding token ID

parseq/strhub/data/utils.py

Lines 54 to 55 in 048f0bb

def _tok2ids(self, tokens: str) -> List[int]:

return [self._stoi[s] for s in tokens]

WongVi · 2022-08-12T06:04:31Z

@baudm Thank you I will check and update soon.

WongVi · 2022-08-17T07:51:02Z

@baudm I have checked and it works very well.
I have one more question

is it possible to save train weight during testing with updated test parameters?
I mean save pretrained weight by changing some hyperparameters without training.

baudm added the discussion label Aug 12, 2022

WongVi closed this as completed Aug 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unsupported character train as [unk] token #21

unsupported character train as [unk] token #21

WongVi commented Aug 12, 2022 •

edited

Loading

baudm commented Aug 12, 2022

WongVi commented Aug 12, 2022

baudm commented Aug 12, 2022

WongVi commented Aug 12, 2022

WongVi commented Aug 17, 2022

unsupported character train as [unk] token #21

unsupported character train as [unk] token #21

Comments

WongVi commented Aug 12, 2022 • edited Loading

baudm commented Aug 12, 2022

WongVi commented Aug 12, 2022

baudm commented Aug 12, 2022

WongVi commented Aug 12, 2022

WongVi commented Aug 17, 2022

WongVi commented Aug 12, 2022 •

edited

Loading