Skip to content
This repository has been archived by the owner on Jul 7, 2023. It is now read-only.

Add English-Spanish translation problem #1626

Merged
merged 1 commit into from
Jul 18, 2019

Conversation

voluntadpear
Copy link
Contributor

Hi!

I was able to successfully add and use a English-Spanish translation problem, using Common Crawl, EuroParl v7, UN Corpus and ParaCrawl as my datasets.

I trained both an EN-ES and an ES-EN Transformer models (using the transformer_big_single_gpu hyperparameters set) with this new problem specification.

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here (e.g. I signed it!) and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added the cla: no PR author has not signed CLA label Jul 8, 2019
@voluntadpear
Copy link
Contributor Author

Done CLI

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added cla: yes PR author has signed CLA and removed cla: no PR author has not signed CLA labels Jul 8, 2019
Copy link
Contributor

@lukaszkaiser lukaszkaiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great thanks, looks good!

@lukaszkaiser lukaszkaiser merged commit 69a81e8 into tensorflow:master Jul 18, 2019
@lukaszkaiser
Copy link
Contributor

Did you manage to get good results from the models? Great thanks for contributing!

tensorflow-copybara pushed a commit that referenced this pull request Jul 18, 2019
PiperOrigin-RevId: 258838381
@iamyihwa
Copy link

iamyihwa commented Aug 7, 2019

Hello! Thanks for adding en-es module and making it available!

I have ran into an issue when I tried to download data using 't2t-datagen' command.

export PROBLEM=translate_enes_wmt32k
t2t-datagen --data_dir=$DATA_DIR --tmp_dir=$TMP_DIR --problem=$PROBLEM

It works when I use english to german module for example by doing

export PROBLEM=translate_ende_wmt32k

Some part of the error message that I get clearly show that spanish module is not part of the available one at the moment. (there is no enes module appearing where it should be since things seem to be ordered alphabetically.)


ValueError: You must specify one of the supported problems to generate data for:
  * algorithmic_addition_binary40
...

  * translate_ende_wmt_clean_pc_clean32k
  * translate_ende_wmt_multi64k
  * translate_ende_wmt_multi64k_packed1k
  * translate_ende_wmt_pc32k
  * translate_ende_wmt_pc_clean32k
  * translate_enet_wmt32k
  * translate_enet_wmt_characters
  * translate_enfr_wmt32k

I have tried installing through 'pip install tensor2tensor' as well as 'pip install git+https://github.com/tensorflow/tensor2tensor.git' to check if the last changes were not reflected for the official release.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla: yes PR author has signed CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants