Yomichan shouldn't prioritize exact match over frequency. #1669

epistularum · 2021-05-12T15:55:04Z

The frequency of de-inflected verbs/adj should be properly taken into account. When looking up 歩き chances are you actually want to see the definition of 歩く first, that is why the frequency of 歩く is way higher than the frequency of 歩き and tthat order should be respected when displayed within yomichan.
It gets quite difficult (or near impossible) to find the deconjugated match under multiple "exact matches". Names, for instance. In my case, I have to go through 29 entries in order to finally find 落ちる when looking up おち.

Here are some examples, I've excluded more extreme examples that would result in images ridiculously long:

toasted-nutbread · 2021-05-14T01:28:46Z

You seem to be having two different issues here:

Exact matches appearing before deinflected matches (you would see the same thing if the score was identical for 歩き and 歩く).
Names appearing before "more meaningful" definitions.

I would argue that 1 is the correct behaviour, because how do we know the user doesn't want to see 歩き instead of 歩く? 歩き has the additional noun meaning which could be correct for the context. Compare vs Jisho, which also doesn't list 歩く at the top. And while maybe this is a contrived example, a learner should also be able to intuit that 歩き is a form of 歩く from both the raw text and the definition.

2 is probably the same issue as #105, and you can improve this by decreasing the priority of the names dictionary.

Thermospore · 2021-05-14T04:21:05Z

Yeah I just moved jmnedict to a separate profile so I didn't have to flip through stacks of names when looking for a word

ttu-ttu · 2021-05-14T10:59:39Z

I was thinking maybe provide an option in the settings to prioritize deinflected form over the inflection, and I think it makes sense because in J-J dictionaries, 90% of the time they will ask us do refer to the base (deinflected form).

Another way to deal with this is to place the deinflected form right below the exact match, also controlled by settings of course since I believe it's more of a user preference

epistularum · 2021-05-14T11:45:30Z

I would argue that 1 is the correct behaviour, because how do we know the user doesn't want to see 歩き instead of 歩く?

I believe this should be handled by the freq information. For instance, 歩き has a freq of 2 while 歩く has a freq of 601. This freq information is taken from the provided jmdict dict. On most instances I believe it makes more sense showing the de-inflected form but it is true that sometimes the conjugated form is way more frequent than the unconjugated one. ex: 物思い vs 物思う.
That is why I think we should rely on the freq indicator since it can differentiate between the two.
Having a toggle like ttu-ttu explained is also another idea worth looking into but it is not as granular as what I explained above.

On another note, where does this freq info come from? I can't seem to find it in the jmdict file itself.

2 is probably the same issue as #105, and you can improve this by decreasing the priority of the names dictionary.

I already have my name dictionary on the lowest priority compared to my other dicts. That is why I believe yomichan displays direct matches higher than deconjugated matches. In this example, all the names are considered as a direct match since the looked up text is in phonetic while 食べる need to be de-conjugated and would be considered as an indirect match. At least, that is what my understanding of the behaviour is.

toasted-nutbread · 2021-05-15T00:03:33Z

Another way to deal with this is to place the deinflected form right below the exact match

This information isn't store in the dictionaries that Yomichan imports, and I'm not sure it would be safe in the general case to assume what is and isn't an inflection.

That is why I think we should rely on the freq indicator since it can differentiate between the two.

To clarify: by "freq" do you mean the score for a definition, the green frequency tags, or something else?

On another note, where does this freq info come from? I can't seem to find it in the jmdict file itself.

https://github.com/FooSoft/yomichan-import/blob/83e3e44f46e344bfe66d9c7181caa5b113f8fb2a/edict.go#L160
https://github.com/FooSoft/yomichan-import/blob/83e3e44f46e344bfe66d9c7181caa5b113f8fb2a/edict.go#L48-L65

I already have my name dictionary on the lowest priority compared to my other dicts. That is why I believe yomichan displays direct matches higher than deconjugated matches.

Yeah, I see what you mean now; this issue affects kana-only searches moreso than kanji definitions. There is also some discussion in #1539 about updating how dictionary priority is handled internally, and this may fall into that category as well.

For reference, this is the current code for sorting dictionary entries:

yomichan/ext/js/language/translator.js

Lines 1186 to 1228 in e7d349c

    
           _sortTermDictionaryEntries(dictionaryEntries) { 
        
               const stringComparer = this._stringComparer; 
        
               const compareFunction = (v1, v2) => { 
        
                   // Sort by length of source term 
        
                   let i = v2.maxTransformedTextLength - v1.maxTransformedTextLength; 
        
                   if (i !== 0) { return i; } 
        
                   // Sort by the number of inflection reasons 
        
                   i = v1.inflections.length - v2.inflections.length; 
        
                   if (i !== 0) { return i; } 
        
                   // Sort by how many terms exactly match the source (e.g. for exact kana prioritization) 
        
                   i = v2.sourceTermExactMatchCount - v1.sourceTermExactMatchCount; 
        
                   if (i !== 0) { return i; } 
        
                   // Sort by dictionary priority 
        
                   i = v2.dictionaryPriority - v1.dictionaryPriority; 
        
                   if (i !== 0) { return i; } 
        
                   // Sort by term score 
        
                   i = v2.score - v1.score; 
        
                   if (i !== 0) { return i; } 
        
                   // Sort by headword term text 
        
                   const headwords1 = v1.headwords; 
        
                   const headwords2 = v2.headwords; 
        
                   for (let j = 0, jj = Math.min(headwords1.length, headwords2.length); j < jj; ++j) { 
        
                       const term1 = headwords1[j].term; 
        
                       const term2 = headwords2[j].term; 
        
                       i = term2.length - term1.length; 
        
                       if (i !== 0) { return i; } 
        
                       i = stringComparer.compare(term1, term2); 
        
                       if (i !== 0) { return i; } 
        
                   } 
        
                   // Sort by dictionary order 
        
                   i = v1.dictionaryIndex - v2.dictionaryIndex; 
        
                   return i; 
        
               }; 
        
               dictionaryEntries.sort(compareFunction); 
        
           }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yomichan shouldn't prioritize exact match over frequency. #1669

Yomichan shouldn't prioritize exact match over frequency. #1669

epistularum commented May 12, 2021 •

edited

Loading

toasted-nutbread commented May 14, 2021

Thermospore commented May 14, 2021

ttu-ttu commented May 14, 2021

epistularum commented May 14, 2021

toasted-nutbread commented May 15, 2021

Yomichan shouldn't prioritize exact match over frequency. #1669

Yomichan shouldn't prioritize exact match over frequency. #1669

Comments

epistularum commented May 12, 2021 • edited Loading

toasted-nutbread commented May 14, 2021

Thermospore commented May 14, 2021

ttu-ttu commented May 14, 2021

epistularum commented May 14, 2021

toasted-nutbread commented May 15, 2021

epistularum commented May 12, 2021 •

edited

Loading