Skip to content
This repository has been archived by the owner on Feb 25, 2023. It is now read-only.

"Parse text using installed dictionaries" and "Parse text using MeCab" feature clarification? #2011

Closed
kuroahna opened this issue Nov 15, 2021 · 3 comments · Fixed by #2020
Closed

Comments

@kuroahna
Copy link

kuroahna commented Nov 15, 2021

Under the Text Parsing section, it says

Yomichan is able to scan the sentence surrounding a term and parse individual words of the query text on the search page. This information can be added to Anki cards to provide additional context.

And

Parse text using installed dictionaries

Words are scanned by automatically advancing in the sentence after a matching word.

Parse text using MeCab

Requires a native component to be installed that Yomichan will connect to.
MeCab is a third-party program which uses its own dictionaries and parsing algorithm to decompose sentences into individual words. In order for Yomichan to use it, both MeCab and a native messaging component must be installed. A setup guide can be found here.

  1. Does enabling "Parse text using installed dictionaries" and/or "Parse text using Mecab" affect the parsing algorithm when scanning text? Or does this simply just affect the automatic sentence generation to anki with the {sentence} marker in the anki card format settings?
  2. Can both "Parse text using installed dictionaries" and "Parse text using Mecab" be enabled at the same time?
  3. What does "Parse text using installed dictionaries" exactly do? The description "words are scanned by automatically advancing in the sentence after a matching word" doesn't exactly clarify it to me. Does this mean it uses entries from your installed dictionaries to find new words?
  4. What is the advantages/disadvantages of using "Parse text using Mecab"? Does it provide more accurate parsing compared to Yomichan's built-in parsing algorithm when scanning text? Is it faster?

If possible it would be nice to clarify some of these questions in the main settings page itself

@toasted-nutbread
Copy link
Collaborator

  1. It affects the parsed text at the top of the standalone search page (see reference images below), but it does not affect the {sentence} marker. The {sentence} marker only uses the internal non-Mecab parser (currently).
  2. Yes, if Mecab is installed. There will be a dropdown menu on the search/popup page to change the parse results.
  3. It's a somewhat naive algorithm for parsing words in a sentence. It does this by finding the longest matched text using the installed dictionaries and assuming that is a word. It does not do any sort of sophisticated contextual analysis of whether word boundaries actually form grammatically correct sentences.
  4. Mecab is probably more sophisticated and accurate, though I cannot speak to the exact implementation details. The internal parser will have some results that aren't always so great (see reference image 2), but in the general case it's acceptable.

See also:
https://en.wikipedia.org/wiki/MeCab

Reference image 1:
image
Reference image 2 (internal parser's not-so-great results for full-width romaji words):
image

@kuroahna
Copy link
Author

kuroahna commented Nov 16, 2021

oh wow, thank you!

Yomichan is able to scan the sentence surrounding a term and parse individual words of the query text on the search page. This information can be added to Anki cards to provide additional context.

I honestly didn't know that there was a "search page" available in Yomichan 😅 (click the magnifying glass)

image

That one's definitely on me.

  1. It affects the parsed text at the top of the standalone search page (see reference images below), but it does not affect the {sentence} marker. The {sentence} marker only uses the internal non-Mecab parser (currently).

So just for further clarification, enabling the "Parse text using Mecab" feature doesn't affect when you hold SHIFT + hover over a word to look at the definition, for example? It's only for analyzing words in a sentence in the search page?

Perhaps a quick suggestion to slightly clear up any confusion that is in the explanation:

Yomichan is able to scan the sentence surrounding a term and parse individual words of the query text on the search page. This information can be added to Anki cards to provide additional context.

The "on the search page" part should be a hyperlink to the search page itself. I honestly didn't know that Yomichan had a useful search page feature 😄

@toasted-nutbread
Copy link
Collaborator

So just for further clarification, enabling the "Parse text using Mecab" feature doesn't affect when you hold SHIFT + hover over a word to look at the definition, for example? It's only for analyzing words in a sentence in the search page?

99% of the time, yes. There are rare occasions where the embedded popup will show the parsed text, but it's for very specific content that cannot be scanned any other way.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants