Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

After running translate.py, there are many '@' in result file #269

Open
645709712 opened this issue Feb 19, 2020 · 5 comments
Open

After running translate.py, there are many '@' in result file #269

645709712 opened this issue Feb 19, 2020 · 5 comments

Comments

@645709712
Copy link

I am doing th-en translation,and after running translate.py,I got some results like this:
image
you see,many '@' in translated sentences.I think it's largely related to BPE algorithm.(in valid/test result,No '@')
So,what should I do to solve or improve this problem?
Thank you.

@645709712
Copy link
Author

W@@ ait for more money and then fill it up .
Not sure . Un@@ comfortable .
F@@ ail .
Wr@@ ong push .
It 's been up@@ load .

....
I don't think the word was restored after it was split, so what's wrong with that?

@Raldir
Copy link

Raldir commented Mar 9, 2020

Simply run (s + ' ').replace('@@', '').rstrip() on the output string s.

@skifvideo
Copy link

I think replace('@@ ', '') is a correct way. After all ' .' at the end looks ugly

@RachitBansal
Copy link

What is the conclusion here, @645709712?

@Jeevesh8
Copy link

Use this function . Like so :-

import subprocess

def restore_segmentation(path):
    """
    Take a file segmented with BPE and restore it to its original segmentation.
    """
    assert os.path.isfile(path)
    restore_cmd = "sed -i -r 's/(@@ )|(@@ ?$)//g' %s"
    subprocess.Popen(restore_cmd % path.relpace(' ', '\ '), shell=True).wait()

for f in os.listdir(output_path):
    restore_segmentation(os.path.join(output_path, f))

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants