Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues in Documentation #303

Closed
mo-fu opened this issue Jul 23, 2019 · 2 comments
Closed

Issues in Documentation #303

mo-fu opened this issue Jul 23, 2019 · 2 comments
Assignees

Comments

@mo-fu
Copy link
Contributor

mo-fu commented Jul 23, 2019

While testing out Annif, I found some issues in the documentation. Since there is no way to fork the wiki repository, I will fill this as an issue here:

Simple Subject Format not Specified Correctly

https://github.com/NatLibFi/Annif/wiki/Document-corpus-formats#simple-subject-file-format
Having only subject labels in the *.key files did not work for backends tfidf or fasttext. Both produced unrelated errors. What worked instead was using the subject ids from the thesaurus. This in turn did not work for maui. A possible workaround I have yet to explore is having *.key files for maui and *.tsv label files for Annif.

tfidif Output

creating vectorizer
Traceback (most recent call last):
  File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/bin/annif", line 11, in <module>
    load_entry_point('annif', 'console_scripts', 'annif')()
  File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/flask/cli.py", line 569, in main
    return super(FlaskGroup, self).main(*args, **kwargs)
  File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/flask/cli.py", line 419, in decorator
    return __ctx.invoke(f, *args, **kwargs)
  File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/a1-admin/Annif/annif/cli.py", line 154, in run_train
    proj.train(documents)
  File "/home/a1-admin/Annif/annif/project.py", line 197, in train
    self._create_vectorizer(corpus)
  File "/home/a1-admin/Annif/annif/project.py", line 186, in _create_vectorizer
    self._vectorizer.fit((subj.text for subj in subjectcorpus.subjects))
  File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/sklearn/feature_extraction/text.py", line 1631, in fit
    X = super().fit_transform(raw_documents)
  File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/sklearn/feature_extraction/text.py", line 1058, in fit_transform
    self.fixed_vocabulary_)
  File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/sklearn/feature_extraction/text.py", line 989, in _count_vocab
    raise ValueError("empty vocabulary; perhaps the documents only"
ValueError: empty vocabulary; perhaps the documents only contain stop words

fasttext Output (Truncated)

There were a lot of warnings like warning: Backend fasttext: no labels for document [...] Which at least gave me a hint. Still needed to figure out what was actually going on.

Backend fasttext: creating fastText model
Read 0M words
Number of words:  0
Number of labels: 0
Traceback (most recent call last):
  File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/bin/annif", line 11, in <module>
    load_entry_point('annif', 'console_scripts', 'annif')()
  File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/flask/cli.py", line 569, in main
    return super(FlaskGroup, self).main(*args, **kwargs)
  File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/flask/cli.py", line 419, in decorator
    return __ctx.invoke(f, *args, **kwargs)
  File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/a1-admin/Annif/annif/cli.py", line 154, in run_train
    proj.train(documents)
  File "/home/a1-admin/Annif/annif/project.py", line 198, in train
    self.backend.train(corpus, project=self)
  File "/home/a1-admin/Annif/annif/backend/fasttext.py", line 108, in train
    self._create_model()
  File "/home/a1-admin/Annif/annif/backend/fasttext.py", line 103, in _create_model
    self._model = fastText.train_supervised(trainpath, **params)
  File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/fastText/FastText.py", line 343, in train_supervised
    fasttext.train(ft.f, a)
ValueError: Empty vocabulary. Try a smaller -minCount value.

Name of Docker Container for Maui Backend

https://github.com/NatLibFi/Annif/wiki/Backend%3A-Maui#usage-with-docker
I always had to use the fully qualified name quay.io/natlibfi/mauiservice in the docker commands. Maybe this can be fixed by giving the container a tag?

Incorrect Configuration Example for pav Backend

https://github.com/NatLibFi/Annif/wiki/Backend%3A-PAV#example-configuration
It should be backend=pav instead of backends=ensemble. Note the missing s.

@juhoinkinen juhoinkinen self-assigned this Aug 2, 2019
@juhoinkinen
Copy link
Member

Thanks for reporting! For future, please note that it is possible to edit the Wiki in place, no need to fork.

Name of Docker Container for Maui Backend

https://github.com/NatLibFi/Annif/wiki/Backend%3A-Maui#usage-with-docker
I always had to use the fully qualified name quay.io/natlibfi/mauiservice in the docker commands. Maybe this can be fixed by giving the container a tag?

Corrected the Wiki page to use quay.io/natlibfi/mauiservice.

Incorrect Configuration Example for pav Backend

https://github.com/NatLibFi/Annif/wiki/Backend%3A-PAV#example-configuration
It should be backend=pav instead of backends=ensemble. Note the missing s.

Corrected.

For the simple subject format part I think @osma can clarify for the intended behaviour.

@osma
Copy link
Member

osma commented Aug 8, 2019

@mo-fu The problem with Simple Subject Format is not that it's specified incorrectly, but there is an implementation bug that prevents it from working. I've opened a separate issue #309 to track that bug.

Closing this issue as the other points appear to be resolved now.

@osma osma closed this as completed Aug 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants