You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was looking at Sonic, and noticed there was a recent merge of Norwegian stopwords (which is great), but looking at #236 I couldn't help notice some of the words seem to have either wrong or weird encoding, resulting in words that are not Norwegian stopwords. The list seems to originate from the stopwords-iso project, so perhaps this would be better to raise there as well.
The list also contains some words that I don't think should be stopwords (tilstand is one such word).
I am no licensing expert, but by a quick glance, the Norwegian stopwords from nltk (python project) seems to be a slightly smaller, but better, stopwords list for Norwegian. I am not sure if it's ok to just copy theirs though.
So to sum up, would it be best to open a PR fixing the words that are wrong in the list here, or better to open a PR using the stopwords listed in nltk?
The text was updated successfully, but these errors were encountered:
I was looking at Sonic, and noticed there was a recent merge of Norwegian stopwords (which is great), but looking at #236 I couldn't help notice some of the words seem to have either wrong or weird encoding, resulting in words that are not Norwegian stopwords. The list seems to originate from the stopwords-iso project, so perhaps this would be better to raise there as well.
The list also contains some words that I don't think should be stopwords (
tilstand
is one such word).I am no licensing expert, but by a quick glance, the Norwegian stopwords from nltk (python project) seems to be a slightly smaller, but better, stopwords list for Norwegian. I am not sure if it's ok to just copy theirs though.
So to sum up, would it be best to open a PR fixing the words that are wrong in the list here, or better to open a PR using the stopwords listed in nltk?
The text was updated successfully, but these errors were encountered: