Refactor: use sparse matrices instead of ListSuggestionResult and VectorSuggestionResult #678

osma · 2023-03-03T14:42:55Z

Currently the annif.suggestion module defines two main classes for representing suggestion results, ListSuggestionResult and VectorSuggestionResult. Backends generally use either of these (usually not both) when returning results.

But now that we have started implementing batched suggestions in backends, these representations have become a bit awkward, as they are specific to one document and perhaps also needlessly complicated (often suggestions have to be converted back and forth between the two representations).

I think it would make sense to try to replace both of these classes with a single class, maybe called SuggestionBatch, that can represent the suggestion results for a whole batch of documents (up to 32). It could be implemented using a SciPy sparse matrix or perhaps a sparse array, as arrays are nowadays recommended by SciPy.

This is a rather intrusive refactoring:

the new SuggestionBatch class must be implemented (replacing the old classes)
all backends have to be modified to return suggestions in the new format
annif.eval (in particular EvaluationBatch) must be modified to process suggestions in the new format
AnnifProject.suggest (and related methods) are affected
annif.cli.run_suggest and run_index functions are affected
annif.rest.suggest is affected too (and the batched variant if implemented - see Add REST API method batch-suggest #664)
many unit tests must be modified to match the new data structures

Regarding how to tackle this: I think it would make sense to implement this first on the "outside", that is, changing the return type of AnnifProject.suggest to the new representation, since it already returns a batch, though currently it's a list of {List,Vector}SuggestionResult objects. The eval, CLI and REST code that relies on AnnifProject.suggest must of course be changed too. After this is working, the last step is to change the backends to return the new representation.

osma added the enhancement label Mar 3, 2023

osma mentioned this issue Mar 3, 2023

Support batch suggest in STWFSA backend #666

Open

osma mentioned this issue Mar 14, 2023

Refactor: Represent suggestion results as sparse arrays #681

Merged

osma closed this as completed in #681 Apr 14, 2023

juhoinkinen added this to the 0.61 milestone Apr 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: use sparse matrices instead of ListSuggestionResult and VectorSuggestionResult #678

Refactor: use sparse matrices instead of ListSuggestionResult and VectorSuggestionResult #678

osma commented Mar 3, 2023 •

edited

Loading

Refactor: use sparse matrices instead of ListSuggestionResult and VectorSuggestionResult #678

Refactor: use sparse matrices instead of ListSuggestionResult and VectorSuggestionResult #678

Comments

osma commented Mar 3, 2023 • edited Loading

osma commented Mar 3, 2023 •

edited

Loading