Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing for 宣さない #1467

Closed
birtles opened this issue Nov 27, 2023 · 16 comments · Fixed by #2038
Closed

Parsing for 宣さない #1467

birtles opened this issue Nov 27, 2023 · 16 comments · Fixed by #2038

Comments

@birtles
Copy link
Member

birtles commented Nov 27, 2023

As reported by email, this is a negative form of 宣する

@birtles birtles mentioned this issue Nov 27, 2023
11 tasks
@birtles birtles mentioned this issue Dec 6, 2023
7 tasks
@birtles birtles mentioned this issue Feb 26, 2024
5 tasks
@enellis
Copy link
Contributor

enellis commented Jun 10, 2024

I am curious about whether this form is actually in use. When searching for 宣さない on Google, there are only 2 or 3 unique results. If this form is commonly used, I would expect to see a distinct entry, like for 察する→察す or 達する→達す.

@birtles
Copy link
Member Author

birtles commented Jun 11, 2024

I also could only find a few references using Google, mostly of the form ファウルを宣さない or ヴァイオレイションを宣さない and mostly in the context of basketball.

I looked up the mail where this issue was reported and found the original report:

10ten doesn't correctly match all inflected forms of vs-s verbs. These verbs can either conjugate like v5s verbs (i.e. verbs ending in -す) or like vs-i/vs verbs (i.e. regular する verbs). Currently, only vs-i inflections are matched. For example, there's no match for 宣さない, which is a valid negative form of 宣する.

Does that clarification help?

@enellis
Copy link
Contributor

enellis commented Jun 11, 2024

It helps a lot, thank you!

Maybe I'm wrong but I don't think this is something we need to implement in the inflection parsing because as the linked source states:

(*) for this type of verb both the さない and しない forms of negative are used, depending on the verb.
[...]
Note: This table has been automatically generated for a large range of conjugations. It should not be assumed that for every verb, any single conjugation is as frequently used or as natural as any other.

I think in cases where the さない form applies, there is always separate dictionary entry as す-Godan verb, so that in these cases it will correctly match all forms. Just like in the examples I mentioned in my previous comment there is an entry for 察する and 察す as well as 達する and 達す with the exact same definitions.

Also, when you search for the す-version on that site, you will be forwarded to the する version:
Bildschirmfoto 2024-06-11 um 08 43 02

@birtles
Copy link
Member Author

birtles commented Jun 11, 2024

I think in cases where the さない form applies, there is always separate dictionary entry as す-Godan verb, so that in these cases it will correctly match all forms. Just like in the examples I mentioned in my previous comment there is an entry for 察する and 察す as well as 達する and 達す with the exact same definitions.

I guess for 宣する the problem is there is no 宣す-Godan verb entry so if you try to use 10ten to look up ファウルを宣さない it will fail. Perhaps the author of the email encountered such content.

Perhaps we should submit an entry to JMDict for 宣す since it looks like 日本国語大辞典 (小学館) has it: https://kotobank.jp/word/%E5%AE%A3%E3%81%99-3137997

Weblio also has it but it has a warning that its conjugation data is programmatically generated:

文語活用形辞書はプログラムで機械的に活用形や説明を生成しているため、不適切な項目が含まれていることもあります。ご了承くださいませ。

I couldn't find it in 大辞林 however, only 宣する.

@enellis
Copy link
Contributor

enellis commented Jun 11, 2024

Which dictionary do we trust more?
Given that nearly all Google results reference the same document (in different versions), there is a chance, that it was simply a mistake. Weirdly, in the same document also the form 宣しない is used.

@birtles
Copy link
Member Author

birtles commented Jun 12, 2024

Which dictionary do we trust more?

I trust 日本国語大辞典 more than JMdict but in any case we can submit it to JMdict and the editors there can decide if it's worthwhile since they have access to more dictionaries than I do.

Given that nearly all Google results reference the same document (in different versions), there is a chance, that it was simply a mistake. Weirdly, in the same document also the form 宣しない is used.

Yes, definitely. But "宣す" appears to be quite common appearing in a lot of official documentation in phrases like "開会・閉会を宣す", "承認する旨を宣す" etc.

@birtles
Copy link
Member Author

birtles commented Jun 12, 2024

@enellis
Copy link
Contributor

enellis commented Jun 12, 2024

Great. Thank you for taking care of this!

@enellis
Copy link
Contributor

enellis commented Jun 15, 2024

Notify the owners of those online dictionaries and tell them to fix their software. [vs-c] is the correct verb tag for -す verbs.

Well, it seems that with new submissions they don't allow the す-Godan v5s tag on this kind of verb anymore. In the end, it is indeed something we need to implement in the inflection logic.

Edit 1: There still seems to be something I'm missing. I don't really know much about old classical Japanese literature but reading the Wikipedia article about the サ行変格活用 , there are two types of サ行変格活用-verbs ending in す:

  1. Verbs from classical Japanese literature ending in す, the precursor of する.
  2. する-verbs that transformed to 五段-inflection with the ending す, during the current time of modern Japanese.

「愛する」・「解する」などの活用(口語)は、五段活用(「愛す」「解す」など)になる傾向にある。

However, 宣す being used in today's writing and as 宣さない (to my understanding す doesn't conjugate to さない) strongly suggests to me that it (also) belongs in the second category, and therefore can also be tagged with v5s. Would you agree?

Edit 2: To catch every case of verbs from the second category, regardless of their dictionary entries, I suggest that we add the 未然形 さ (and the following auxiliary verbs) to する-verbs with a special reason for deinflection. Some dictionaries call the 五段-transformation 五段化, so I think recognizing such forms as for example < 五段化 < negative (What would be a good localization for 五段化?) is a good option and would avoid confusion about why さない deinflects to する.

@birtles
Copy link
Member Author

birtles commented Jun 17, 2024

Thank you so much for looking into all this. I'm afraid classical Japanese verb inflections is not something I'm familiar with. I looked into it once in order to generate suitable localizations for the part-of-speech tags but I mostly found different sources used different terminology.

From what I understand, your proposal makes sense. Would there be a risk of matching multiple entries however? e.g. matching both the entry for 愛する and 愛す when looking up 愛せない for example?

For 五段化 we could just go with "Godanka" or perhaps we could try a little harder and go with "as Godan verb". Better still, do we even need to show the reason to be user?

@enellis
Copy link
Contributor

enellis commented Jun 17, 2024

I looked into it once in order to generate suitable localizations for the part-of-speech tags but I mostly found different sources used different terminology.

Yeah, it is very confusing. It seems to me that also the categorization in JMdict and its application are somewhat ambiguous.
FWIW and as far as I can tell I think you did a great job with the english terms.

Would there be a risk of matching multiple entries however? e.g. matching both the entry for 愛する and 愛す when looking up 愛せない for example?

Yes, as this is already the case with many forms: 愛した, 愛して, 愛します, 愛される, 愛させる.

Better still, do we even need to show the reason to be user?

I can imagine people being very confused, I certainly would be, when seeing 愛さない is matched with 愛する, as さない is not an inflection of する. In my opinion, knowing that this is because the verb can also be inflected as a Godan verb would instantly clear things up. "as Godan verb" would serve this purpose very well, I think.

@birtles
Copy link
Member Author

birtles commented Jun 18, 2024

Would there be a risk of matching multiple entries however? e.g. matching both the entry for 愛する and 愛す when looking up 愛せない for example?

Yes, as this is already the case with many forms: 愛した, 愛して, 愛します, 愛される, 愛させる.

Oh, so it is!

Just a thought, I noticed that the Supplementary comments for single-kanji ~する verbs has:

It has also been observed that when there is a related 五段 verb (愛す in the case of 愛する) the ~さない form tends to be used.

If that's true and all these する verbs that display the 五段化 behavior also have a corresponding す verb in JMdict (whether or not it is marked as v5s or vs-c) can we make the 五段化 rule have a to-type of vs-c?

If not, then I think your proposal makes sense. Although would we make the to-type for said rules be vs-s?

Sorry if my questions don't make sense, I might not be understanding the different cases properly.

@enellis
Copy link
Contributor

enellis commented Jun 18, 2024

Supplementary comments for single-kanji ~する verbs

Thanks for sharing the link. That was some really useful information.

So if I'm understanding correctly, summing the things I read about this topic up, there are 3 categories for suru-verbs in JMdict:

  • vs-i: These show no irregularities and inflect like any other noun that takes する.
  • vs-s: These verbs generally inflect like vs-i verbs but may exhibit the following irregularities:
    • Inheriting the 未然形「せ」of the classical verb す, they can inflect to
      • せさせる (causative)
      • せられる (passive)
    • 五段化: Treated as す-五段 verbs, they can additionally inflect to:
      • さない (negative), さず (-zu)
      • そう (-sou)
      • せる (potential)
      • せ (imperative)
    • Treated as 「⚪︎しる」 一段 verbs, they can inflect to:
      • しず (-zu)
      • しさせる (causative)
      • しられる (passive)
  • vs-c: These are old su-verbs that inflect according to the classical verb す.

If that's true and all these する verbs that display the 五段化 behavior also have a corresponding す verb in JMdict (whether or not it is marked as v5s or vs-c) can we make the 五段化 rule have a to-type of vs-c?

Initially, I thought this was the case, but it is not. For example, the in the Wikipedia article given example 発する has no corresponding entry for 発す but shows all of the above mentioned irregularities.

What I'd propose is

  • for verbs tagged vs-i:: Nothing to do here.
  • for verbs tagged vs-s: Integrating a new vs-s to-type is a good idea. We should implement the irregular inflection rule set targeting the new type and matching it to verbs with this tag only. Maybe we should generalize the 五段化 reason, so it can be applied to all of the above mentioned irregularities, to something like irregular.
  • for verbs (solely) tagged vs-c: In the current state, avoid matching these with any deinflected forms, as we simply don't support classical inflections.

@birtles
Copy link
Member Author

birtles commented Jun 19, 2024

Thank you so much for the thorough investigation here! That's really helpful.

What I'd propose is

  • for verbs tagged vs-i:: Nothing to do here.
  • for verbs tagged vs-s: Integrating a new vs-s to-type is a good idea. We should implement the irregular inflection rule set targeting the new type and matching it to verbs with this tag only. Maybe we should generalize the 五段化 reason, so it can be applied to all of the above mentioned irregularities, to something like irregular.
  • for verbs (solely) tagged vs-c: In the current state, avoid matching these with any deinflected forms, as we simply don't support classical inflections.

That sounds perfect. Thank you again!

@enellis
Copy link
Contributor

enellis commented Jun 19, 2024

Great, I will start working on it once the 敬語-PR is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants