Question regarding benchmark Lingua comparison #9

Marcono1234 · 2024-07-07T16:58:42Z

Hello,
in your benchmark in the README you got pretty bad performance for Lingua. How exactly do you execute Lingua?
Lingua uses quite large models which have to be loaded once (or lazily during usage), but afterwards detection speed should be quite fast if you keep reusing the same detector I think (which is the intended usage). However, if you keep creating new detector instances for every detection, then performance will be rather bad. Also, Lingua requires a lot of memory during runtime, so if you are running it in a memory-constrained environment, maybe its performance will not be that good either.

Have you tried Lingua version 2 as well¹? It is based on the Rust implementation and its performance will likely be better. For measuring performance it might also be useful to:

Thanks for doing this benchmark in the first place though!

That version might also cover more than the 54 languages you mention in the README. ↩

The text was updated successfully, but these errors were encountered:

nitotm · 2024-07-08T11:05:10Z

Soon I'm going to redo all benchmarks, for an ELD v3, so it is a good opportunity to fix anything that might be incorrect.

For lingua I use the same detector for each line, so that is not the problem. I did the benchmarks on a 16GB machine, now I have 32GB. I don't see any problem with memory, it uses ~400mb, not too much really. On windows 10.
I was surprised at how slow it was, I tried different things, but I also saw others had the same problem.

Have you tried it? Lingua <2.0 against any of the other detectors I tested to see if the performance difference matches?

I have not tried Lingua v2, I guess I will for the new benchmarks.

Marcono1234 · 2024-07-14T21:58:23Z

I did the benchmarks on a 16GB machine, now I have 32GB. I don't see any problem with memory, it uses ~400mb, not too much really.

Yes you are right, that should be more than enough.

Have you tried it? Lingua <2.0 against any of the other detectors I tested to see if the performance difference matches?

Sorry, I hadn't actually tried Lingua < 2.0 yet. But I have compared Lingua 1.3.5 and 2.0.2 now:

Lingua version	Loading all models¹	Detection²
1.3.5	29.02s	233.68s
2.0.2	8.43s	21.97s

So it seems you are right, the performance of Lingua < 2.0 is really not that great. Would really be worth it giving Lingua 2 a try.

Using LanguageDetectorBuilder.from_all_languages().with_preloaded_language_models() ↩
I was testing detection 1000 times of 16 sentences in different languages; though the absolute time value might not be that interesting here, rather the ratio between the Lingua versions ↩

nitotm · 2024-08-16T15:31:12Z

I’m redoing the benchmarks for v3, and I’m trying Lingua 2.0.2, what a difference really, with my installation of 1.3.2 I’m seeing a great difference. I’m also using with_preloaded_language_models() and it is reasonably fast now.
I will close the issue when I publish v3

nitotm · 2024-09-05T14:57:11Z

I uploaded ELD v3-beta with the new benchmarks, now Lingua is reasonably fast.

I still find discrepancies in their benchmarks, according to them Lingua-low is x2 slower than fasttext, which is fine; I tested x2-x5 depending on the benchmark, but then their test with CLD2 is very similar in speed to fasttext, and I think CLD2 should be >= x2 faster than fasttext.
(Also, their benchmark for CLD2 is unfair, as they are not using bestEffort = True which would improve its accuracy considerably)

Discussion for v3-beta at: #10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question regarding benchmark Lingua comparison #9

Question regarding benchmark Lingua comparison #9

Marcono1234 commented Jul 7, 2024 •

edited

Loading

nitotm commented Jul 8, 2024

Marcono1234 commented Jul 14, 2024

nitotm commented Aug 16, 2024

nitotm commented Sep 5, 2024 •

edited

Loading

Question regarding benchmark Lingua comparison #9

Question regarding benchmark Lingua comparison #9

Comments

Marcono1234 commented Jul 7, 2024 • edited Loading

Footnotes

nitotm commented Jul 8, 2024

Marcono1234 commented Jul 14, 2024

Footnotes

nitotm commented Aug 16, 2024

nitotm commented Sep 5, 2024 • edited Loading

Marcono1234 commented Jul 7, 2024 •

edited

Loading

nitotm commented Sep 5, 2024 •

edited

Loading