Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: use sparse matrices instead of ListSuggestionResult and VectorSuggestionResult #678

Closed
osma opened this issue Mar 3, 2023 · 0 comments · Fixed by #681
Closed
Milestone

Comments

@osma
Copy link
Member

osma commented Mar 3, 2023

Currently the annif.suggestion module defines two main classes for representing suggestion results, ListSuggestionResult and VectorSuggestionResult. Backends generally use either of these (usually not both) when returning results.

But now that we have started implementing batched suggestions in backends, these representations have become a bit awkward, as they are specific to one document and perhaps also needlessly complicated (often suggestions have to be converted back and forth between the two representations).

I think it would make sense to try to replace both of these classes with a single class, maybe called SuggestionBatch, that can represent the suggestion results for a whole batch of documents (up to 32). It could be implemented using a SciPy sparse matrix or perhaps a sparse array, as arrays are nowadays recommended by SciPy.

This is a rather intrusive refactoring:

  • the new SuggestionBatch class must be implemented (replacing the old classes)
  • all backends have to be modified to return suggestions in the new format
  • annif.eval (in particular EvaluationBatch) must be modified to process suggestions in the new format
  • AnnifProject.suggest (and related methods) are affected
  • annif.cli.run_suggest and run_index functions are affected
  • annif.rest.suggest is affected too (and the batched variant if implemented - see Add REST API method batch-suggest #664)
  • many unit tests must be modified to match the new data structures

Regarding how to tackle this: I think it would make sense to implement this first on the "outside", that is, changing the return type of AnnifProject.suggest to the new representation, since it already returns a batch, though currently it's a list of {List,Vector}SuggestionResult objects. The eval, CLI and REST code that relies on AnnifProject.suggest must of course be changed too. After this is working, the last step is to change the backends to return the new representation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants