Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Snowball - Use ISO language codes #1029

Merged
merged 2 commits into from
Dec 14, 2023

Conversation

PrimozGodec
Copy link
Collaborator

Issue

This PR is part of #963, which I am splitting into smaller pieces for easier review.

The main motivation behind this is to make Preprocess work with language from Corpus.

Description of changes

This PR prepare a Snowball normalizer to communicate (get and return languages) as ISO codes, which is necessary to enable language from Corpus (languages are stored in Corpus in ISO format).

After I changed Snowball to work with ISO language codes, I also had to adapt the Preprocess Widget to store settings as ISO codes and call the Lemmagen filter with ISO language code.

Udpipe will be implemented in separate PRs.

Includes
  • Code changes
  • Tests
  • Documentation

@PrimozGodec
Copy link
Collaborator Author

@VesnaT as #1025, you can make a workflow to test migration with tag 1.15.0 (git checkout 1.15.0) and open it with this change. It should work.

@codecov-commenter
Copy link

Codecov Report

Merging #1029 (aa3306c) into master (0495fd5) will decrease coverage by 0.02%.
The diff coverage is 100.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1029      +/-   ##
==========================================
- Coverage   82.21%   82.20%   -0.02%     
==========================================
  Files          93       93              
  Lines       12294    12295       +1     
  Branches     1668     1670       +2     
==========================================
- Hits        10108    10107       -1     
- Misses       1877     1880       +3     
+ Partials      309      308       -1     

@VesnaT VesnaT merged commit 02b1892 into biolab:master Dec 14, 2023
8 of 12 checks passed
@PrimozGodec PrimozGodec deleted the lang-iso-snowball branch December 22, 2023 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants