Improved performance #1

qeterme · 2021-10-08T08:34:51Z

Added a simple truecaser based on the token position in a sentence and its pos
Added number masking

oroszgy · 2021-10-19T11:40:09Z

Could you please report here the accuracy before and after changes? It would be also nice to have the errors on the UD corpus (for both the before and after state).

lemmy/cli/__main__.py

lemmy/lemmatizer.py

qeterme · 2021-10-19T16:34:35Z

Both results trained and tested on UD_Hungarian-Szeged.

Before

Accuracy: 92.74%

After

Accuracy: 93.47%

I also attached the debug outputs of both run.
before.txt
after.txt

qeterme added 3 commits September 27, 2021 11:10

Added simple truecaser

32eb73e

Number masking added

922938d

Added position to tagged words

d876751

oroszgy suggested changes Oct 19, 2021

View reviewed changes

lemmy/cli/__main__.py Outdated Show resolved Hide resolved

lemmy/cli/__main__.py Outdated Show resolved Hide resolved

lemmy/lemmatizer.py Outdated Show resolved Hide resolved

lemmy/lemmatizer.py Outdated Show resolved Hide resolved

lemmy/lemmatizer.py Show resolved Hide resolved

Fixed requested parts

9c5cca6

oroszgy approved these changes Oct 20, 2021

View reviewed changes

oroszgy merged commit d3406d8 into master Oct 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved performance #1

Improved performance #1

qeterme commented Oct 8, 2021

oroszgy commented Oct 19, 2021

qeterme commented Oct 19, 2021

Improved performance #1

Improved performance #1

Conversation

qeterme commented Oct 8, 2021

oroszgy commented Oct 19, 2021

qeterme commented Oct 19, 2021

Before

After