Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Lemmagen - Use ISO language codes #1025

Merged
merged 2 commits into from
Dec 12, 2023

Conversation

PrimozGodec
Copy link
Collaborator

Issue

This PR is part of #963, which I am splitting into smaller pieces for easier review.
The main motivation behind this is to make Preprocess work with language from Corpus.

Description of changes

This PR prepare a Lemmagen normalizer to communicate (get and return languages) as ISO codes, which is necessary to enable language from Corpus (languages are stored in Corpus in ISO format).

After I changed Lemmagen to work with ISO language codes, I also had to adapt the Preprocess Widget to store settings as ISO codes and call the Lemmagen filter with ISO language code.

Udpipe and Snowball will be implemented in separate PRs.

Includes
  • Code changes
  • Tests
  • Documentation

@PrimozGodec
Copy link
Collaborator Author

/rebase

@VesnaT VesnaT self-assigned this Dec 8, 2023
@VesnaT
Copy link
Contributor

VesnaT commented Dec 11, 2023

I get the following error message when I open a saved workflow.
image

I'm attaching the workflow (I reset settings before creating it):
untitled2.ows.zip

@PrimozGodec
Copy link
Collaborator Author

@VesnaT, the problem here is that I didn't increase the settings version. Since I increased it in #1024, and it has not been released yet, I think it may not be necessary. What do you think? Increasing the settings version would complicate the implementation of the migrations.

If you are okay with not changing the settings version, you can make te workflow with tag 1.15.0 (git checkout 1.15.0) and open it with this change. It should work.

@VesnaT VesnaT merged commit 0495fd5 into biolab:master Dec 12, 2023
10 of 12 checks passed
@PrimozGodec PrimozGodec deleted the language-normalizers branch December 12, 2023 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants