Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

From where can one get ted_talks_langs.txt and to what do we have to pass it as an input? #1

Open
ekdnam opened this issue Jul 26, 2021 · 1 comment

Comments

@ekdnam
Copy link

ekdnam commented Jul 26, 2021

In scripts/get-ted-talks-data.sh, the following line has been commented.

# we use ./scripts/ted_talks_langs.txt to extract all langs-en pairs to ./data/ted_data

From where can we get that file, what is its utility, and to which module do we have to pass it as an input?

ekdnam added a commit to ekdnam/improving-zeroshot-nmt that referenced this issue Jul 30, 2021
@surafelml
Copy link
Owner

My apologies for the delay! Please see the latest version

# use the pre-specified src-trg lang-pairs from ./scripts/ted_talks_langs.txt to extract parallel data from ./ted_data
# for pairs other than it/ro-en, update ./scripts/ted_talks_langs.txt
python $EXPDIR/scripts/ted_reader.py --lang-file $EXPDIR/scripts/ted_talks_langs.txt --data-path ./ted-data/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants