Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training from .key files with only subject labels fails #309

Closed
osma opened this issue Aug 8, 2019 · 1 comment · Fixed by #313
Closed

Training from .key files with only subject labels fails #309

osma opened this issue Aug 8, 2019 · 1 comment · Fixed by #313
Assignees
Labels
Milestone

Comments

@osma
Copy link
Member

osma commented Aug 8, 2019

As pointed out in #303, it's impossible to train from a directory with .txt and .key files which contain only subject labels but no URIs. Example traceback with tfidf backend:

Traceback (most recent call last):
  File "/home/oisuomin/.local/share/virtualenvs/Annif-G8ShVyyO/bin/annif", line 11, in <module>
    load_entry_point('annif', 'console_scripts', 'annif')()
  File "/home/oisuomin/.local/share/virtualenvs/Annif-G8ShVyyO/lib/python3.5/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/oisuomin/.local/share/virtualenvs/Annif-G8ShVyyO/lib/python3.5/site-packages/flask/cli.py", line 569, in main
    return super(FlaskGroup, self).main(*args, **kwargs)
  File "/home/oisuomin/.local/share/virtualenvs/Annif-G8ShVyyO/lib/python3.5/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/oisuomin/.local/share/virtualenvs/Annif-G8ShVyyO/lib/python3.5/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/oisuomin/.local/share/virtualenvs/Annif-G8ShVyyO/lib/python3.5/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/oisuomin/.local/share/virtualenvs/Annif-G8ShVyyO/lib/python3.5/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/oisuomin/.local/share/virtualenvs/Annif-G8ShVyyO/lib/python3.5/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/oisuomin/.local/share/virtualenvs/Annif-G8ShVyyO/lib/python3.5/site-packages/flask/cli.py", line 419, in decorator
    return __ctx.invoke(f, *args, **kwargs)
  File "/home/oisuomin/.local/share/virtualenvs/Annif-G8ShVyyO/lib/python3.5/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/local/oisuomin/git/Annif/annif/cli.py", line 154, in run_train
    proj.train(documents)
  File "/home/local/oisuomin/git/Annif/annif/project.py", line 197, in train
    self._create_vectorizer(corpus)
  File "/home/local/oisuomin/git/Annif/annif/project.py", line 186, in _create_vectorizer
    self._vectorizer.fit((subj.text for subj in subjectcorpus.subjects))
  File "/home/oisuomin/.local/share/virtualenvs/Annif-G8ShVyyO/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 1631, in fit
    X = super().fit_transform(raw_documents)
  File "/home/oisuomin/.local/share/virtualenvs/Annif-G8ShVyyO/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 1058, in fit_transform
    self.fixed_vocabulary_)
  File "/home/oisuomin/.local/share/virtualenvs/Annif-G8ShVyyO/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 989, in _count_vocab
    raise ValueError("empty vocabulary; perhaps the documents only"
ValueError: empty vocabulary; perhaps the documents only contain stop words

With fasttext backend:

Traceback (most recent call last):
  File "/home/oisuomin/.local/share/virtualenvs/Annif-G8ShVyyO/bin/annif", line 11, in <module>
    load_entry_point('annif', 'console_scripts', 'annif')()
  File "/home/oisuomin/.local/share/virtualenvs/Annif-G8ShVyyO/lib/python3.5/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/oisuomin/.local/share/virtualenvs/Annif-G8ShVyyO/lib/python3.5/site-packages/flask/cli.py", line 569, in main
    return super(FlaskGroup, self).main(*args, **kwargs)
  File "/home/oisuomin/.local/share/virtualenvs/Annif-G8ShVyyO/lib/python3.5/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/oisuomin/.local/share/virtualenvs/Annif-G8ShVyyO/lib/python3.5/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/oisuomin/.local/share/virtualenvs/Annif-G8ShVyyO/lib/python3.5/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/oisuomin/.local/share/virtualenvs/Annif-G8ShVyyO/lib/python3.5/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/oisuomin/.local/share/virtualenvs/Annif-G8ShVyyO/lib/python3.5/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/oisuomin/.local/share/virtualenvs/Annif-G8ShVyyO/lib/python3.5/site-packages/flask/cli.py", line 419, in decorator
    return __ctx.invoke(f, *args, **kwargs)
  File "/home/oisuomin/.local/share/virtualenvs/Annif-G8ShVyyO/lib/python3.5/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/local/oisuomin/git/Annif/annif/cli.py", line 154, in run_train
    proj.train(documents)
  File "/home/local/oisuomin/git/Annif/annif/project.py", line 198, in train
    self.backend.train(corpus, project=self)
  File "/home/local/oisuomin/git/Annif/annif/backend/fasttext.py", line 108, in train
    self._create_model()
  File "/home/local/oisuomin/git/Annif/annif/backend/fasttext.py", line 103, in _create_model
    self._model = fastText.train_supervised(trainpath, **params)
  File "/home/oisuomin/.local/share/virtualenvs/Annif-G8ShVyyO/lib/python3.5/site-packages/fastText/FastText.py", line 343, in train_supervised
    fasttext.train(ft.f, a)
ValueError: Empty vocabulary. Try a smaller -minCount value.

The problem seems to be that subject labels are not being converted to URIs internally, although they should.

@osma osma added the bug label Aug 8, 2019
@osma osma added this to the 0.42 milestone Aug 8, 2019
@osma osma self-assigned this Aug 8, 2019
osma added a commit that referenced this issue Aug 8, 2019
@osma osma closed this as completed in #313 Aug 9, 2019
@osma
Copy link
Member Author

osma commented Aug 9, 2019

This has now been fixed, thanks to @mo-fu for pointing it out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant