Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse non-tabular grammar information #21

Closed
radomirbosak opened this issue Nov 20, 2016 · 7 comments
Closed

Parse non-tabular grammar information #21

radomirbosak opened this issue Nov 20, 2016 · 7 comments

Comments

@radomirbosak
Copy link
Owner

>>> python3 -c "import duden; print(duden.get('Kragen').grammar_raw)"
[]

duden.de, however has this text supplied in the grammar section:

der Kragen; Genitiv: des Kragens, Plural: die Kragen, süddeutsch, österreichisch, schweizerisch: Krägen

The script shouldn't omit this. However it is not clear in which form should it present the information in this rare case of non-table-like format.

@radomirbosak
Copy link
Owner Author

The word Mönch uses a similar format for grammar.

der Mönch; Genitiv: des Mönch[e]s, Plural: die Mönche

@radomirbosak
Copy link
Owner Author

Verbs like schwindeln also use a non-tabular grammar format.

schwaches Verb; Perfektbildung mit »hat«

We should first decide what should the function return for words like these.

@radomirbosak
Copy link
Owner Author

This bug is still valid in 0.10.0 .

@radomirbosak
Copy link
Owner Author

radomirbosak commented Aug 3, 2018

the word Bereich has the same problem - the section doesn't contain any tables, just the text der, selten: das Bereich; Genitiv: des Bereich[e]s, Plural: die Bereiche.

@radomirbosak radomirbosak changed the title Word 'Kragen' doesn't display any grammar information in v0.7.0 Parse non-tabular grammar information Jun 14, 2020
@radomirbosak radomirbosak removed the bug label Jun 14, 2020
pajowu added a commit to pajowu/duden that referenced this issue Jan 9, 2021
Before this commit, grammar_raw returned None if no '.grammatik' element is found. This leads to TypeError in 'Word.grammar' and seems unintuitive, since in other cases an empty list is retuned (see radomirbosakGH-21).
Now grammar_raw returns an empty list if no grammar section is found.
radomirbosak pushed a commit that referenced this issue Jan 17, 2021
Before this commit, grammar_raw returned None if no '.grammatik' element is found. This leads to TypeError in 'Word.grammar' and seems unintuitive, since in other cases an empty list is retuned (see GH-21).
Now grammar_raw returns an empty list if no grammar section is found.
@radomirbosak
Copy link
Owner Author

The word Meme is also a bit special since it has multiple spellings. This will complicate grammar parsing.
image

@radomirbosak
Copy link
Owner Author

radomirbosak commented Sep 21, 2022

A related attribute grammar_overview was added in #168 . However, it does not parse the text data there.

@radomirbosak
Copy link
Owner Author

The above new attribute should be enough and I won't be implementing more detailed parsing of a highly variable text attribute. I will close this as out of scope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant