Skip to content
This repository has been archived by the owner on Feb 1, 2023. It is now read-only.

Improved performance #1

Merged
merged 4 commits into from
Oct 20, 2021
Merged

Improved performance #1

merged 4 commits into from
Oct 20, 2021

Conversation

qeterme
Copy link
Member

@qeterme qeterme commented Oct 8, 2021

  • Added a simple truecaser based on the token position in a sentence and its pos
  • Added number masking

@oroszgy
Copy link
Member

oroszgy commented Oct 19, 2021

Could you please report here the accuracy before and after changes? It would be also nice to have the errors on the UD corpus (for both the before and after state).

lemmy/cli/__main__.py Outdated Show resolved Hide resolved
lemmy/cli/__main__.py Outdated Show resolved Hide resolved
lemmy/lemmatizer.py Outdated Show resolved Hide resolved
lemmy/lemmatizer.py Outdated Show resolved Hide resolved
lemmy/lemmatizer.py Show resolved Hide resolved
@qeterme
Copy link
Member Author

qeterme commented Oct 19, 2021

Both results trained and tested on UD_Hungarian-Szeged.

Before

Accuracy: 92.74%

After

Accuracy: 93.47%

I also attached the debug outputs of both run.
before.txt
after.txt

@oroszgy oroszgy merged commit d3406d8 into master Oct 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants