You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Quite similar to issue #589 but I have to open a new one for the old one was closed. The steps to reproduce as below:
~/my_dir $ pip show spacy
Name: spacy
Version: 1.8.2
Summary: Industrial-strength Natural Language Processing (NLP) with Python and Cython
Home-page: https://spacy.io
Author: Matthew Honnibal
Author-email: matt@explosion.ai
License: MIT
Location: /usr/lib/python2.7/site-packages
Requires: numpy, murmurhash, cymem, preshed, thinc, plac, six, pathlib, ujson, dill, requests, regex, ftfy
~/my_dir $ python
Python 2.7.13 (default, Dec 22 2016, 09:22:15)
[GCC 6.2.1 20160822] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import spacy
>>> nlp = spacy.en.English()
>>> nlp.vocab.strings.set_frozen(True)
>>> nlp(u'Whataasdfsdaf')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/site-packages/spacy/language.py", line 320, in __call__
doc = self.make_doc(text)
File "/usr/lib/python2.7/site-packages/spacy/language.py", line 293, in <lambda>
self.make_doc = lambda text: self.tokenizer(text)
File "spacy/tokenizer.pyx", line 165, in spacy.tokenizer.Tokenizer.__call__ (spacy/tokenizer.cpp:5486)
File "spacy/tokenizer.pyx", line 205, in spacy.tokenizer.Tokenizer._tokenize (spacy/tokenizer.cpp:6060)
File "spacy/tokenizer.pyx", line 279, in spacy.tokenizer.Tokenizer._attach_tokens (spacy/tokenizer.cpp:7129)
File "spacy/vocab.pyx", line 246, in spacy.vocab.Vocab.get (spacy/vocab.cpp:6986)
File "spacy/vocab.pyx", line 269, in spacy.vocab.Vocab._new_lexeme (spacy/vocab.cpp:7249)
OverflowError: value too large to convert to int32_t
The text was updated successfully, but these errors were encountered:
Thanks for the report! The set_frozen mechanism has been a stop-gap, and I'm not immediately sure what's changed here that's broken it. I'll likely fix the underlying problem for spaCy 2, rather than repairing this. The situation around the streaming data memory growth is much better in spaCy 2, because the integer IDs are now hash values, rather than strings.
In short: the streaming data memory growth is finally fixed properly in spaCy v2 🎉 . This means the flakey set_frozen functionality could be deleted from the StringStore, resolving this issue.
Quite similar to issue #589 but I have to open a new one for the old one was closed. The steps to reproduce as below:
The text was updated successfully, but these errors were encountered: