You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While testing out Annif, I found some issues in the documentation. Since there is no way to fork the wiki repository, I will fill this as an issue here:
Simple Subject Format not Specified Correctly
https://github.com/NatLibFi/Annif/wiki/Document-corpus-formats#simple-subject-file-format
Having only subject labels in the *.key files did not work for backends tfidf or fasttext. Both produced unrelated errors. What worked instead was using the subject ids from the thesaurus. This in turn did not work for maui. A possible workaround I have yet to explore is having *.key files for maui and *.tsv label files for Annif.
tfidif Output
creating vectorizer
Traceback (most recent call last):
File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/bin/annif", line 11, in <module>
load_entry_point('annif', 'console_scripts', 'annif')()
File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/flask/cli.py", line 569, in main
return super(FlaskGroup, self).main(*args, **kwargs)
File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/flask/cli.py", line 419, in decorator
return __ctx.invoke(f, *args, **kwargs)
File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/a1-admin/Annif/annif/cli.py", line 154, in run_train
proj.train(documents)
File "/home/a1-admin/Annif/annif/project.py", line 197, in train
self._create_vectorizer(corpus)
File "/home/a1-admin/Annif/annif/project.py", line 186, in _create_vectorizer
self._vectorizer.fit((subj.text for subj in subjectcorpus.subjects))
File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/sklearn/feature_extraction/text.py", line 1631, in fit
X = super().fit_transform(raw_documents)
File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/sklearn/feature_extraction/text.py", line 1058, in fit_transform
self.fixed_vocabulary_)
File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/sklearn/feature_extraction/text.py", line 989, in _count_vocab
raise ValueError("empty vocabulary; perhaps the documents only"
ValueError: empty vocabulary; perhaps the documents only contain stop words
fasttext Output (Truncated)
There were a lot of warnings like warning: Backend fasttext: no labels for document [...] Which at least gave me a hint. Still needed to figure out what was actually going on.
Backend fasttext: creating fastText model
Read 0M words
Number of words: 0
Number of labels: 0
Traceback (most recent call last):
File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/bin/annif", line 11, in <module>
load_entry_point('annif', 'console_scripts', 'annif')()
File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/flask/cli.py", line 569, in main
return super(FlaskGroup, self).main(*args, **kwargs)
File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/flask/cli.py", line 419, in decorator
return __ctx.invoke(f, *args, **kwargs)
File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/a1-admin/Annif/annif/cli.py", line 154, in run_train
proj.train(documents)
File "/home/a1-admin/Annif/annif/project.py", line 198, in train
self.backend.train(corpus, project=self)
File "/home/a1-admin/Annif/annif/backend/fasttext.py", line 108, in train
self._create_model()
File "/home/a1-admin/Annif/annif/backend/fasttext.py", line 103, in _create_model
self._model = fastText.train_supervised(trainpath, **params)
File "/home/a1-admin/.local/share/virtualenvs/Annif-za4j8g77/lib/python3.6/site-packages/fastText/FastText.py", line 343, in train_supervised
fasttext.train(ft.f, a)
ValueError: Empty vocabulary. Try a smaller -minCount value.
@mo-fu The problem with Simple Subject Format is not that it's specified incorrectly, but there is an implementation bug that prevents it from working. I've opened a separate issue #309 to track that bug.
Closing this issue as the other points appear to be resolved now.
While testing out Annif, I found some issues in the documentation. Since there is no way to fork the wiki repository, I will fill this as an issue here:
Simple Subject Format not Specified Correctly
https://github.com/NatLibFi/Annif/wiki/Document-corpus-formats#simple-subject-file-format
Having only subject labels in the
*.key
files did not work for backendstfidf
orfasttext
. Both produced unrelated errors. What worked instead was using the subject ids from the thesaurus. This in turn did not work for maui. A possible workaround I have yet to explore is having*.key
files for maui and*.tsv
label files for Annif.tfidif Output
fasttext Output (Truncated)
There were a lot of warnings like
warning: Backend fasttext: no labels for document [...]
Which at least gave me a hint. Still needed to figure out what was actually going on.Name of Docker Container for Maui Backend
https://github.com/NatLibFi/Annif/wiki/Backend%3A-Maui#usage-with-docker
I always had to use the fully qualified name
quay.io/natlibfi/mauiservice
in the docker commands. Maybe this can be fixed by giving the container a tag?Incorrect Configuration Example for pav Backend
https://github.com/NatLibFi/Annif/wiki/Backend%3A-PAV#example-configuration
It should be
backend=pav
instead ofbackends=ensemble
. Note the missing s.The text was updated successfully, but these errors were encountered: