Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Dutch language #84

Merged
merged 2 commits into from
Oct 20, 2022
Merged

Add Dutch language #84

merged 2 commits into from
Oct 20, 2022

Conversation

TomJansen
Copy link
Contributor

No description provided.

@sspanak sspanak self-requested a review October 18, 2022 15:46
@sspanak sspanak self-assigned this Oct 18, 2022
@sspanak
Copy link
Owner

sspanak commented Oct 18, 2022

@TomJansen, there are a couple of things about the word list.

  1. I see there are words such as:
06
06-nummer
06-nummers

However, in Predictive mode it is not possible to type words that contain numbers, because, well... numbers are not words. So please remove them from the file. They would only take up space, but would never appear as suggestions. Also, they will not be accepted in the next version. I am making the validation a bit more strict.

Besides, it is always possible to type any of the above by holding "0", then "6", then proceeding with the actual word. There is no need to have the numbers in the dictionary.

  1. Single letters are added automatically, so there is no need to include them either.

  2. Are these real words?

z'en
z'je
z'n
z's
z'tje
z'tjes
z.g.
z.g.a.n.
z.i.
z.o.z.
z.s.m.
  1. The ones below seem to be French or German words. Are they used in Dutch? If yes, is this the correct spelling? I am trying to understand if the extra characters are actually required. Having less, will make it easier both for the phone and for people, when typing. If the answers are "no", let's convert the words to Dutch spelling or remove them.
à
à-la-carterestaurant
à-la-carterestaurants
échéance
échéances
élégance
époque
één
öre
öres
über-ich
überhaupt
übermensch
übermenschen
  1. Before merging, I will run a script to check and remove any repeating words. Just so you know.

@TomJansen
Copy link
Contributor Author

TomJansen commented Oct 18, 2022

  1. OK, I removed all words containing numbers
  2. Removed those as well
  3. In Dutch, those are abriviations, and used quite a lot!
  4. These are loan words (however, not "één"), some of them are used more than others. However, extra characters are used in Dutch. Querying the official AOSP Dutch dictionary from https://android.googlesource.com/platform/packages/inputmethods/LatinIME/+/refs/heads/master/dictionaries/ (which has 2x less words btw) reveals this distribution of extra characters used:
    É - 2
    Ö - 4
    á - 13
    â - 4
    ã - 1
    ä - 24
    ç - 21
    è - 172
    é - 329
    ê - 34
    ë - 1243
    í - 16
    î - 3
    ï - 483
    ñ - 4
    ó - 10
    ô - 4
    ö - 158
    ø - 2
    û - 3
    ü - 66

@TomJansen
Copy link
Contributor Author

I also used the wrong font for the taskbar icon, fixed that now

@sspanak
Copy link
Owner

sspanak commented Oct 19, 2022

4. These are loan words (however, not "één"), some of them are used more than others. However, extra characters are used in Dutch. Querying the official AOSP Dutch dictionary from https://android.googlesource.com/platform/packages/inputmethods/LatinIME/+/refs/heads/master/dictionaries/ (which has 2x less words btw) reveals this distribution of extra characters used:

I see. Well, my philosophy is: it is OK to have some loan words, but don't go over the top. Language X dictionary is supposed to contain language X words. If you need to type in another language, just switch to that language.

Anyway, if you think you are all done, I'll double check everything, test on my phone just in case, then merge.

@TomJansen
Copy link
Contributor Author

Yes I think this PR is all done. We can always change the dictionary words later if it contains too many loan words, but I think it is OK. It is the same dictionary that is used by Firefox and OpenOffice for Dutch.

@sspanak sspanak assigned TomJansen and unassigned sspanak Oct 19, 2022
@sspanak sspanak merged commit 8534a70 into sspanak:master Oct 20, 2022
@sspanak sspanak added the languages Dictionary or language related issues label Oct 20, 2022
@TomJansen TomJansen deleted the Dutch_dictionary branch October 20, 2022 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
languages Dictionary or language related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants