Skip to content
This repository has been archived by the owner on Mar 9, 2023. It is now read-only.

AttributeError: 'sudachipy.latticenode.LatticeNode' object has no attribute 'begin' #133

Closed
hiroshi-matsuda-rit opened this issue Jun 18, 2020 · 7 comments · Fixed by #134
Assignees

Comments

@hiroshi-matsuda-rit
Copy link
Contributor

This error might be related to the cythonization. @polm @sorami
Do you have the test cases for this API?

  File "/mnt/c/git/spaCy/venv.wsl/lib/python3.8/site-packages/sudachipy/morpheme.py", line 56, in split
    return self.list.split(mode, self.index, wi)
  File "/mnt/c/git/spaCy/venv.wsl/lib/python3.8/site-packages/sudachipy/morphemelist.py", line 75, in split
    n.begin = offset
AttributeError: 'sudachipy.latticenode.LatticeNode' object has no attribute 'begin'
@sorami
Copy link
Collaborator

sorami commented Jun 18, 2020

I had similar cases while investigating #128.

Sorry, no, there were no test cases for this method.

I think the error is because, with Cythonization, you don't have direct access to attributes, i.e., it should be n.set_begin() instead (this method already exists).

There may be more such cases, which the current test cases didn't catch.

@sorami
Copy link
Collaborator

sorami commented Jun 19, 2020

from sudachipy import tokenizer
from sudachipy import dictionary

tokenizer_obj = dictionary.Dictionary().create()

mode = tokenizer.Tokenizer.SplitMode.C
morpheme = tokenizer_obj.tokenize("国家公務員", mode)[0]
morpheme.surface() # '国家公務員'

morpheme.split(tokenizer.Tokenizer.SplitMode.A)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-25-af36be3916ed> in <module>
----> 1 morpheme.split(tokenizer.Tokenizer.SplitMode.A)

SudachiPy/sudachipy/morpheme.py in split(self, mode)
     54     def split(self, mode):
     55         wi = self.get_word_info()
---> 56         return self.list.split(mode, self.index, wi)
     57
     58     def is_oov(self):

SudachiPy/sudachipy/morphemelist.py in split(self, mode, index, wi)
     73         for wid in word_ids:
     74             n = latticenode.LatticeNode(self.lexicon, 0, 0, 0, wid)
---> 75             n.begin = offset
     76             offset += n.get_word_info().head_word_length
     77             n.end = offset

AttributeError: 'sudachipy.latticenode.LatticeNode' object has no attribute 'begin'

@sorami
Copy link
Collaborator

sorami commented Jun 19, 2020

I have fixed the case, and added a test for this method in #134.

I am now looking at other parts of code that the Cythonization may affect (i.e., related to Lattice and LatticeNode) which we missed due to lack of test.

@sorami
Copy link
Collaborator

sorami commented Jun 19, 2020

Memo about splitting in A or B mode;

When using Tokenizer to split text, the splitting from C mode to A/B mode is done by the method Tokenizer._split_path().

However, there are separate methods Morpheme.split() and MorphemeList.split() which is independent from the above Tokenizer method.

And there were no test cases for the latter, therefore this issue was not discovered until now.

@polm
Copy link
Contributor

polm commented Jun 19, 2020

Sorry I missed this issue too... I thought I check the Cythonized attributes during development but obviously I missed some. I'll take a look and see what else I missed.

sorami pushed a commit that referenced this issue Jun 19, 2020
* Fix latttice node related access due to Cythonization #133

* Add a case for morpheme split
@sorami
Copy link
Collaborator

sorami commented Jun 19, 2020

I have released yet another version v0.4.9 to fix this issue.

@hiroshi-matsuda-rit
Copy link
Contributor Author

Thank you so mcuh! @sorami and @polm

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants