Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated export? #81

Open
sbrl opened this issue Oct 1, 2024 · 2 comments
Open

Automated export? #81

sbrl opened this issue Oct 1, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@sbrl
Copy link
Member

sbrl commented Oct 1, 2024

As I understand it this repo is about exporting translation pairs from tldr-pages to OPUS for a high-quality translation pairs dataset. Given that opus claims the last export was ~August 2023, is it possible to automate the export via e.g. GitHub actions etc?

Then a) we don't have to worry about it, and b) opus get a nice updated dataset regularly.

@sbrl sbrl added the enhancement New feature or request label Oct 1, 2024
@kbdharun
Copy link
Member

kbdharun commented Oct 2, 2024

Hi, @sbrl we already provide exported datasets under the latest release that is automatically updated every month through GitHub actions (i.e. https://github.com/tldr-pages/tldr-translation-pairs-gen/releases/latest).

Additionally, I also publish the CSV dataset officially under our org in Kaggle at https://www.kaggle.com/datasets/tldr-pages/tldr-pages-translation-pairs-dataset.

Regarding the OPUS Corpus, they seem to update the dataset based on releases made to the repo here, so IG I will create quarterly releases so that the dataset is up to date upstream. (Will do one now)

@sbrl
Copy link
Member Author

sbrl commented Oct 2, 2024

Sounds good!

Yeah, I did see the kaggle dataset there, but hadn't explored it yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants