Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashes on certain words #146

Closed
VIEWVIEWVIEW opened this issue Oct 6, 2021 · 6 comments
Closed

Crashes on certain words #146

VIEWVIEWVIEW opened this issue Oct 6, 2021 · 6 comments

Comments

@VIEWVIEWVIEW
Copy link
Contributor

Example export:

$ python run_duden.py --export Meme > tests/test_data/Meme.yaml

Results in the following crash:

~/duden$ python run_duden.py --export Meme > tests/test_data/Meme.yaml
Traceback (most recent call last):
  File "run_duden.py", line 4, in <module>
    main()
  File "/home/w/duden/duden/cli.py", line 173, in main
    display_word(word, args)
  File "/home/w/duden/duden/cli.py", line 61, in display_word
    yaml_string = yaml.dump(word.export(),
  File "/home/w/duden/duden/word.py", line 298, in export
    worddict[attribute] = getattr(self, attribute, None)
  File "/home/w/duden/duden/word.py", line 61, in name
    name, _ = self.title.split(', ')
ValueError: too many values to unpack (expected 2)

Apparently there is a crash on the word "Meme" due to too many splits of the word.name and word.article propety. This can be fixed by setting the maxsplit in the split call:

name, _ = self.title.split(', ', 1)

https://docs.python.org/3/library/stdtypes.html#str.split

@VIEWVIEWVIEW
Copy link
Contributor Author

There are two lines where this crash can occur:

name, _ = self.title.split(', ')

_, article = self.title.split(', ')

@radomirbosak
Copy link
Owner

radomirbosak commented Oct 6, 2021

Thank you for filing this issue. The word Meme is interesting since it splits into three parts in contrast to other words
image

@VIEWVIEWVIEW
Copy link
Contributor Author

VIEWVIEWVIEW commented Oct 6, 2021

I guess the simple fix

_, article = self.title.split(', ', 1)

is not appropriate, since it would cut information off. What do you think?

@radomirbosak
Copy link
Owner

radomirbosak commented Oct 6, 2021

Right, setting maxsplit to 1 would solve the issue when getting name, but it wouldn't work for the article which would be incorrectly determined as auch Mem, das.

@radomirbosak
Copy link
Owner

Maybe a more robust way to get the name and article would be to locate the lemma__* spans.
image

@radomirbosak
Copy link
Owner

This could be also an opportunity to introduce a property like alternative_spellings or similar which would return the list of all lemma__alt-spelling contents. (Although I'm not sure if there are words with 2 or more alternative spellings)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants