Updated OPUS Open Subtitles Dataset with metadata information #1865
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Close #1844
Problems:
python datasets-cli test datasets/open_subtitles --save_infos --all_configs
, hence the change indataset_infos.json
, but it appears that the metadata features have not been added for all pairs. Any idea why that might be?pip uninstall datasets && pip install -e ".[dev]"
after the changes, and loading the dataset viaload_dataset("open_subtitles", lang1='hi', lang2='it')
to check if the update worked, but the loaded dataset did not contain the metadata fields (neither in the features nor doingnext(iter(dataset['train']))
). What step(s) did I miss?Questions:
classmethod
in there? I have not seen any in the few other datasets I have checked. I could make it a local method of the_generate_examples
method, but I'd rather not duplicate the logic...