After running translate.py, there are many '@' in result file #269

645709712 · 2020-02-19T03:16:02Z

I am doing th-en translation,and after running translate.py,I got some results like this:

you see,many '@' in translated sentences.I think it's largely related to BPE algorithm.(in valid/test result,No '@')
So,what should I do to solve or improve this problem?
Thank you.

645709712 · 2020-02-19T11:57:36Z

W@@ ait for more money and then fill it up .
Not sure . Un@@ comfortable .
F@@ ail .
Wr@@ ong push .
It 's been up@@ load .

....
I don't think the word was restored after it was split, so what's wrong with that？

Raldir · 2020-03-09T20:33:49Z

Simply run (s + ' ').replace('@@', '').rstrip() on the output string s.

skifvideo · 2020-08-17T17:30:51Z

I think replace('@@ ', '') is a correct way. After all ' .' at the end looks ugly

RachitBansal · 2020-09-24T08:25:17Z

What is the conclusion here, @645709712?

Jeevesh8 · 2020-09-24T11:51:19Z

Use this function . Like so :-

import subprocess

def restore_segmentation(path):
    """
    Take a file segmented with BPE and restore it to its original segmentation.
    """
    assert os.path.isfile(path)
    restore_cmd = "sed -i -r 's/(@@ )|(@@ ?$)//g' %s"
    subprocess.Popen(restore_cmd % path.relpace(' ', '\ '), shell=True).wait()

for f in os.listdir(output_path):
    restore_segmentation(os.path.join(output_path, f))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After running translate.py, there are many '@' in result file #269

After running translate.py, there are many '@' in result file #269

645709712 commented Feb 19, 2020

645709712 commented Feb 19, 2020

Raldir commented Mar 9, 2020

skifvideo commented Aug 17, 2020

RachitBansal commented Sep 24, 2020

Jeevesh8 commented Sep 24, 2020

After running translate.py, there are many '@' in result file #269

After running translate.py, there are many '@' in result file #269

Comments

645709712 commented Feb 19, 2020

645709712 commented Feb 19, 2020

Raldir commented Mar 9, 2020

skifvideo commented Aug 17, 2020

RachitBansal commented Sep 24, 2020

Jeevesh8 commented Sep 24, 2020