You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When evaluating a TFIDF project trained on multiple files (CombinedCorpus) the eval crashes:
(Annif) jmminkin@lx8-9811-008:/home/local/jmminkin/git/Annif$ annif train tfidf-fi yso-cicero-finna-fi-head-500-lines.tsv yso-cicero-finna-fi-tail-500-lines.tsv
creating vectorizer
warning: Unknown subject URI <http://www.yso.fi/onto/yso/p14645>
...
Backend tfidf: creating similarity index
(Annif) jmminkin@lx8-9811-008:/home/local/jmminkin/git/Annif$ annif eval tfidf-fi ~/annif-projects/Annif-corpora/fulltext/kirjastonhoitaja/test/
warning: Unknown subject URI <http://www.yso.fi/onto/yso/p1997>
...
Traceback (most recent call last):
File "/home/jmminkin/.local/share/virtualenvs/Annif-b5vsMxU8/bin/annif", line 11, in <module>
load_entry_point('annif', 'console_scripts', 'annif')()
File "/home/jmminkin/.local/share/virtualenvs/Annif-b5vsMxU8/lib/python3.6/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/home/jmminkin/.local/share/virtualenvs/Annif-b5vsMxU8/lib/python3.6/site-packages/flask/cli.py", line 586, in main
return super(FlaskGroup, self).main(*args, **kwargs)
File "/home/jmminkin/.local/share/virtualenvs/Annif-b5vsMxU8/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/jmminkin/.local/share/virtualenvs/Annif-b5vsMxU8/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/jmminkin/.local/share/virtualenvs/Annif-b5vsMxU8/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/jmminkin/.local/share/virtualenvs/Annif-b5vsMxU8/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/jmminkin/.local/share/virtualenvs/Annif-b5vsMxU8/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/jmminkin/.local/share/virtualenvs/Annif-b5vsMxU8/lib/python3.6/site-packages/flask/cli.py", line 426, in decorator
return __ctx.invoke(f, *args, **kwargs)
File "/home/jmminkin/.local/share/virtualenvs/Annif-b5vsMxU8/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/local/jmminkin/git/Annif/annif/cli.py", line 276, in run_eval
for metric, score in eval_batch.results().items():
File "/home/local/jmminkin/git/Annif/annif/eval.py", line 143, in results
y_true, y_pred, metrics)
File "/home/local/jmminkin/git/Annif/annif/eval.py", line 93, in _evaluate_samples
y_true, y_pred_binary, average='samples')
File "/home/jmminkin/.local/share/virtualenvs/Annif-b5vsMxU8/lib/python3.6/site-packages/sklearn/metrics/classification.py", line 1569, in precision_score
sample_weight=sample_weight)
File "/home/jmminkin/.local/share/virtualenvs/Annif-b5vsMxU8/lib/python3.6/site-packages/sklearn/metrics/classification.py", line 1415, in precision_recall_fscore_support
pos_label)
File "/home/jmminkin/.local/share/virtualenvs/Annif-b5vsMxU8/lib/python3.6/site-packages/sklearn/metrics/classification.py", line 1240, in _check_set_wise_labels
present_labels = unique_labels(y_true, y_pred)
File "/home/jmminkin/.local/share/virtualenvs/Annif-b5vsMxU8/lib/python3.6/site-packages/sklearn/utils/multiclass.py", line 88, in unique_labels
raise ValueError("Multi-label binary indicator input with "
ValueError: Multi-label binary indicator input with different numbers of labels
Also suggest does not seem to work with such a project (although this could be unrelated):
Confirmed. As it happens, I just trained a tfidf project using the yso-cicero-finna-fi-* training data (all four of them - it took a while!) and I get the same error running eval and no results when using suggest.
I'm a bit surprised if CombinedCorpus turns out to be a problem here, because the backend should not be able to tell that the corpus is a combination of several files.
When evaluating a TFIDF project trained on multiple files (
CombinedCorpus
) theeval
crashes:Also
suggest
does not seem to work with such a project (although this could be unrelated):If a fasttext project is trained on multiple files like above,
eval
works andsuggest
produces results.The
eval
crash might be due to that TFIDF backend saves the_index
that is created in training (when its size is multiplied by the number of the input files), and then for predictions the same_index
is loaded and used (but this is not the case for fasttext project). However,eval
simply uses the project's vocabulary](https://github.com/NatLibFi/Annif/blob/master/annif/cli.py#L266) which does not know about the multiplied size of the_index
, leading to the size mismatch mentioned in the traceback.The text was updated successfully, but these errors were encountered: