Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use more words for Hebrew #607

Closed
MayaGeva opened this issue Aug 25, 2024 · 13 comments · Fixed by #645
Closed

Use more words for Hebrew #607

MayaGeva opened this issue Aug 25, 2024 · 13 comments · Fixed by #645
Labels
help wanted Extra attention is needed languages Dictionary or language related issues

Comments

@MayaGeva
Copy link

There are a lot of words missing from the Hebrew dictionary you use. Could you please include words from this repo?

https://github.com/eyaler/hebrew_wordlists

thanks in advance :)

@sspanak
Copy link
Owner

sspanak commented Aug 25, 2024

The current dictionary is copied from that repo. Another user helped doing that. However, I can see there are more than 30 files there and maybe there is more complete one, I suppose? I can't read Hebrew, so I will need hand to pick the right one.

@sspanak sspanak added enhancement New feature or request languages Dictionary or language related issues labels Aug 25, 2024
@MayaGeva
Copy link
Author

The entire purpose of that repo is to have a big collection because finding a good one is luck luster.
I've searched around for a better one but couldn't find any that have all of the words. The thing about Hebrew is that phrases like and or in are added to the beginning of the word, so it's important to load all of the words in that repo since it contains all of those combinations. Personally when using tt9 I've notices all of the gerunds are missing from the dictionary. Could you make sure all of the files from the repo are loaded? I know it's a lot of files but all of them are necessary.

@sspanak
Copy link
Owner

sspanak commented Aug 26, 2024

I've tried merging all files, except the "CC" ones, then cleaned up the repeating words. This resulted in a 95 Mb file, containing 5568356 words. This is way too much. I can't include such a dictionary, sorry.

TT9 can run almost smoothly with up to 1.5 million words. I haven't tested with larger dictionaries, but I suppose, it will be fine with up to 2 million. Beyond that, it will simply overwhelm lower end devices, both in terms of storage space and CPU usage.

If you can come up with some criteria of reducing the word count or maybe using only some of the lists from that repo, I will include the new words. From my experience with Slavic languages, where words have a lot of different suffixes, 1 to 1.5 million words result in very nice typing experience. I believe this is what we should be aiming for. Unfortunately, as I previously said, I can't even read Hebrew, so I can't make the right choice. I can't make any choice.

I am attaching the merged file for reference.

hebrew.merged.zip

@sspanak sspanak added the help wanted Extra attention is needed label Aug 26, 2024
@DarthFlip
Copy link
Contributor

@AshiVered אפשר לעזור?

@AshiVered
Copy link
Contributor

I also used a file that someone else had already created.
@udif
https://github.com/udif/traditionalt9

@sspanak
Copy link
Owner

sspanak commented Aug 30, 2024

... and udif has taken it from the eyealer repo, as per the readme. We've come a full circle. 😆

Just to clarify, the problem is with the "all_*_prefixes" files. They are the ones that need filtering. If any of you guys knows of a large Hebrew word frequency list, I can use it to remove the unpopular words and keep the 1 million most popular ones. After that I'll just merge in the personal names, the gerunds, the nouns, the adjectives and whatnot. Most of these smaller files are within the 100k range and it shouldn't be a big deal to keep them all.

Edit: My bad, there is a frequency list there. Could someone please check the cc100.csv file and confirm the words there are indeed ordered by popularity?

@udif
Copy link

udif commented Aug 30, 2024

frankly, my latest effort were more in my qinpad repo (renamed qinhpad). The crucial piece I was still missing was being able to call the google voice input from within my IME, but that proved a bit more complex than I thought, and I haven't revisited this recently.

@sspanak
Copy link
Owner

sspanak commented Aug 30, 2024

No big deal. I just need confirmation that the "cc100.csv" file is ordered by frequency and I can do everything by myself.

@udif, out of curiosity, have you tried TT9 on the Qin 1s? These phones are anything but standard, so I am wondering if it would even install and start up properly. If it does, you may finally get the chance of using voice input.

@AshiVered
Copy link
Contributor

I try it on Qin1s+.
it's work, but It was too cumbersome for me with too many functions...
So I use my Qinboard T9.
But to each his own preferences :)

@udif
Copy link

udif commented Sep 2, 2024

@udif, out of curiosity, have you tried TT9 on the Qin 1s? These phones are anything but standard, so I am wondering if it would even install and start up properly. If it does, you may finally get the chance of using voice input.

I have an F21 pro, not a Qin 1s.

@sspanak
Copy link
Owner

sspanak commented Sep 2, 2024

Still waiting for someone to check the cc100.csv file...

I try it on Qin1s+.
it's work, but It was too cumbersome for me with too many functions...
So I use my Qinboard T9.
But to each his own preferences :)

Yeah, I agree there are too many functions now. I even think I should stop adding new ones. For this reason, in the newer versions, I tried to make TT9 configure itself automatically, depending on the Android version and the available hardware. Hopefully, new users can just install and use, without playing with the options, if they don't want to.

I have an F21 pro, not a Qin 1s.

I see. As far as I know, on that phone, voice input is possible only with Google services.

@sspanak sspanak removed the enhancement New feature or request label Sep 2, 2024
@udif
Copy link

udif commented Sep 2, 2024

I see. As far as I know, on that phone, voice input is possible only with Google services.

That is known 😄

@sspanak sspanak linked a pull request Oct 3, 2024 that will close this issue
@sspanak
Copy link
Owner

sspanak commented Oct 3, 2024

Good news, folks, I think I've figured it out myself! I have included many new words with prefixes, personal and location names and bible words. I am not sure what the last one means, it is just how the file is named. The dictionary now consists of 1.5 million words. I believe this enough to cover most everyday conversations and even some specialized ones.

As for the gerunds, I've double checked and it looks like all of them were previously available in TT9. Or at least, all the gerunds from eyealer's repo. If it feels many of them are still missing, I don't think I can do much more, unless someone suggests another good word source.

All in all, I will include the new words in the upcoming v39.0. I hope you will enjoy typing even more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed languages Dictionary or language related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants