Broad api calls #20

xtrojak · 2021-07-13T06:59:47Z

Improvement of API calls efficiency - where possible, broader calls were created (instead of obtaining single result, e.g. InChIKey, we obtain multiple attributes at once). Results are stored in cache for the single spectra data, as the obtained result can be reused for another job. This was implemented in Annotator.execute_job_with_cache.

Another cache for async requests was implemented using asynchronous version of lru_cache. Close #8.

Additionally, implementation of individual services was simplified where possible (mostly by separating calls and parsing), but this caused issue #19.

…ut value; this is being cached for single spectra

…; create a function to dynamically assign top-level conversion methods to Converters

libs/utils/HashableDict.py

hechth

Code looks good, only problem is that asyncstdlib is not on conda!

martenson

Seems fine. I've added simple test data input and output files.

The efficiency of the double caching with both execute_job_with_cache and the @lru_cache method decorator is unclear to me. Testing with a significant dataset could clarify this, but is not required at the moment in my opinion.

Beware that the default lru_cache maxsize is unlimited.

martenson · 2021-07-15T18:35:53Z

Oh, and we have a 500MB .msp file in the UMSA library for some serious testing: https://umsa.cerit-sc.cz/library/list#folders/F1c84aa7fc4490e6d/datasets/9a34fd777b6c8572

martenson · 2021-07-15T18:59:18Z

libs/services/Converter.py

@@ -6,6 +9,7 @@ class Converter:
    def __init__(self, session):
        self.session = session

+    @lru_cache


Are you sure self is hashable here? It contains the session.

You are right, we have to make sure default hash based on id is sufficient here. We will have a look into that in #1 (cache testing).

xtrojak · 2021-07-16T05:40:52Z

Seems fine. I've added simple test data input and output files.

The efficiency of the double caching with both execute_job_with_cache and the @lru_cache method decorator is unclear to me. Testing with a significant dataset could clarify this, but is not required at the moment in my opinion.

The current design requires specification of type (source, target, service), which defines that user wants to obtain target attribute based on source attribute using particular service. In this PR, I have changed queries to obtain as much data as possible from single API call. In execute_job_with_cache we cache these data for single spectra. Then for another triple requesting an already cached attribute we can just return it. Note that the request itself would be actually different from the previous one, therefore lru_cache would not help.

lru_cache, on the other hand, is used to avoid executing the exactly same requests multiple times. Mostly applied when spectra in a single .msp file share an attribute (which seems to be the case).

Beware that the default lru_cache maxsize is unlimited.

It seems like default is 128 (default arg value)

xtrojak added 6 commits July 12, 2021 14:32

Generalised Annotator to expect a dictionary instead of a single outp…

d7fcc8b

…ut value; this is being cached for single spectra

Fixed automatic enumeration of supported conversions

3768843

Increased code readability and robustness

d38a190

Made individual services methods more compact (parsing in particular)…

fc1eba5

…; create a function to dynamically assign top-level conversion methods to Converters

Added async cache for Converter.query_the_service method

2cf987a

Updated tests

2b0d8a3

xtrojak requested review from martenson and hechth July 13, 2021 06:59

hechth approved these changes Jul 14, 2021

View reviewed changes

libs/utils/HashableDict.py Outdated Show resolved Hide resolved

hechth reviewed Jul 14, 2021

View reviewed changes

hechth self-requested a review July 14, 2021 06:23

xtrojak and others added 2 commits July 14, 2021 08:48

Replaced custom hashable dict with a frozendict lib

e046b2a

add test data

6fe8d95

martenson approved these changes Jul 15, 2021

View reviewed changes

martenson reviewed Jul 15, 2021

View reviewed changes

xtrojak mentioned this pull request Jul 16, 2021

Add testing for query caching #1

Closed

martenson merged commit 59dbb1e into RECETOX:main Jul 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Broad api calls #20

Broad api calls #20

xtrojak commented Jul 13, 2021

hechth left a comment

martenson left a comment

martenson commented Jul 15, 2021

martenson Jul 15, 2021

xtrojak Jul 16, 2021

xtrojak commented Jul 16, 2021

Broad api calls #20

Broad api calls #20

Conversation

xtrojak commented Jul 13, 2021

hechth left a comment

Choose a reason for hiding this comment

martenson left a comment

Choose a reason for hiding this comment

martenson commented Jul 15, 2021

martenson Jul 15, 2021

Choose a reason for hiding this comment

xtrojak Jul 16, 2021

Choose a reason for hiding this comment

xtrojak commented Jul 16, 2021