Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use group -1 for luts_png #603

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

will-moore
Copy link
Member

@will-moore will-moore commented Jan 7, 2025

Issues that need addressing:

  • We can't rely on cache being enabled for omero-web
  • Dynamic generation of all LUTs takes a while and loads a lot of files from the server

Proposed solution:

  • We use the a new file webgateway/static/json/luts.json as the source for LUTs to build the dynamic /webgateway/luts_png/ (the /webgateway/luts_png/ still corresponds to the dynamic /luts/ JSON that comes from the server. Any LUTs from the server that are not in the static luts.json will be shown as white in /webgateway/luts_png/.
  • If you use /webgateway/luts/rgb=true then the JSON produced will include the rgb values for each LUT. This uses the static luts.json, but if there are new LUTs coming from the server (NOT in the luts.json) then we load those LUT files from the server.
  • The workflow for updating the LUTs in the static luts.json is to simply take the JSON output from /webgateway/luts/rgb=true and save it into webgateway/static/json/luts.json.

NB: The static luts.json in this PR doesn't yet have the "new" LUTs recently added. This enables us to test a few things. Then, we can update the luts.json with recent LUTs before merging this PR...

To test:

  • Go to /webgateway/luts/ - this will be unchanged from before. Compare to /webgateway/luts/?rgb=true which has rgb for every LUT - each is an array of shape (256, 3). The recently added LUTs that are not yet in the static luts.json (e.g. cividis.lut) are being generated on the fly from the server.
  • Go to /webgateway/luts_png/. This is now being generated from the rgb values in the static luts.json, instead of loading LUT files from the server. This is very fast, so caching functionality is no-longer needed (has been removed). You will see white gaps for LUTs that are not yet cached in the static luts.json. ]
  • Check README for instructions on updating the static luts.json

@will-moore
Copy link
Member Author

cc @Tom-TBT

@Tom-TBT
Copy link
Contributor

Tom-TBT commented Jan 7, 2025

Thank you for the fix Will, sorry for that.

@will-moore
Copy link
Member Author

@Tom-TBT This is some strange behaviour we are seeing ONLY on our production server (which has a very long history), but not on any other server we've tested on. So, nothing to apologise for!

@jburel
Copy link
Member

jburel commented Jan 7, 2025

after investigation it seems that the wrong permissions are set for the user group on the production server. This PR is not needed

@jburel jburel closed this Jan 7, 2025
@will-moore
Copy link
Member Author

We are seeing significant performance improvements with this change if the user's default group has a large number of users, so it seems to be worth including.

@will-moore will-moore reopened this Jan 15, 2025
@Tom-TBT
Copy link
Contributor

Tom-TBT commented Jan 15, 2025

Interesting. I'm curious, can you explain why the performance is affected here?
Without setting group to -1, does getObject search first for all users in the group before searching outside of the group?

@will-moore
Copy link
Member Author

@Tom-TBT The issue of slow cross-group queries when a user is in a big group are very low-level OMERO internals. I'm not sure if it's documented anywhere...

Another issue we're finding is that we can't assume that omero-web installations will have caching enabled.

The default behaviour is for caching to be disabled. Default is a dummy cache:

('{"default": {"BACKEND":' ' "django.core.cache.backends.dummy.DummyCache"}}'),

and configuring e.g. redis is described as optional:

https://omero.readthedocs.io/en/stable/sysadmins/unix/install-web/walkthrough/omeroweb-install-rockylinux9-ice3.6.html#running-omero-web

So, what is best to do? Options:

  • Update omeroweb/settings.py to use some valid cache config by default. E.g. https://docs.djangoproject.com/en/5.1/topics/cache/#filesystem-caching where we'd need to figure out a suitable LOCATION and there's warnings about locations within MEDIA_ROOT , STATIC_ROOT, or STATICFILES_FINDERS.
    or django.core.cache.backends.locmem.LocMemCache - not ideal for production but could be fine as a default for occasional use such as luts?
  • Update our docs (and code) to try and get all users to enable caching? Seems like a big task just for this usage of caching, but maybe we want to take more advantage of caching if we know it's available?
  • As an alternative to caching luts_png in Django, we could resort to saving it within OMERO. E.g. a shared OriginalFile with cache_key as the name? Not as nice a caching in Django, but certainly better than no caching at-all.
  • Any other options? Could ask admins to run some luts_png generation on the server after updating LUTS, but not very nice!

@will-moore
Copy link
Member Author

Tested setting the group to the user group with this addition to the script above:

user_group_id = conn.getAdminService().getSecurityRoles().userGroupId
conn.SERVICE_OPTS.setOmeroGroup(user_group_id)

And logged-in to nightshade as test-user who's default group has many users.

However, this had no difference in performance compared with leaving the group as the user's default group. Only setting group: -1 causes a slow-down.

@knabar knabar added this to the 5.29.0 milestone Jan 17, 2025
@will-moore will-moore force-pushed the luts_getOriginalFile_group-1 branch from 576d62d to eb71bb1 Compare January 20, 2025 15:46
@will-moore
Copy link
Member Author

cc @jburel @Tom-TBT I've updated the description to correspond to the proposed solution now in this PR, with options TBD...

@will-moore will-moore requested review from pwalczysko and jburel and removed request for pwalczysko January 22, 2025 13:34
@pwalczysko
Copy link
Member

pwalczysko commented Jan 24, 2025

NB: for testing this PR, use /webgateway/luts_png/?cached=false to disable the cache.

Could you please expand on how to test ?

I went on merge-ci as user-3 to url

https://merge-ci.openmicroscopy.org/web/webgateway/luts_png/

and it loaded.

When I went to

https://merge-ci.openmicroscopy.org/web/webgateway/luts_png/webgateway/luts_png/?cached=false

then it loaded too.

There was no perceptible speed differentce between the two cases ^^^

@will-moore
Copy link
Member Author

Discussed at web meeting today...
It would be nice to replace usage of LUTS_IN_PNG list and static luts_10.png with a JSON object that combines both the LUT names alongside the 256 rgb values of the LUT.

README.rst Outdated Show resolved Hide resolved
README.rst Outdated
cached in https://github.com/ome/omero-web/blob/master/omeroweb/webgateway/static/webgateway/json/luts.json.
The LUTs in the `/luts_png/` will always correspond to the LUTs on the server as available in JSON
from `/webgateway/luts/`.
If new LUTs are added to the server and are not found in the `luts.json` then the `/luts_png/` will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again confusing.
luts.json is a visual representation i.e. name and associated png

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, luts.json is a json file (see this PR).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, I should have said "holds a visual representation but If new LUTs are added to the server and are not found in the `luts.json is not clear

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote the README - hopefully more clear now?

Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Immediate thoughts while reading this proposal is that it brings us back to the coupling issues originally raised in #568. When new LUTs get added to the server, the following must happen
1- a release of OMERO.web with an updated luts.json
2- possibly a release of all web apps of the ecosystem to depend on the newest version of OMERO.web

Has there been any consideration about hybrid solution where the cached JSON would be retrieved and only the missing LUTs would be dynamically fetched from the server when retrieving the PNG representation?

README.rst Outdated

The OMERO server ships with a set of look-up tables (LUTs) for rendering images. Users can also
add their own LUTs to the server. The LUTs available on the server can be retrieved from the
`/webgateway/luts/` endpoint as JSON data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it certainly does not hurt to have this documentation here, it feels a bit at odds with the rest of the README.

Should https://omero.readthedocs.io/en/stable/developers/Web/WebGateway.html be extended to cover the LUT endpoints of the OMERO.web gateway?

@jburel
Copy link
Member

jburel commented Feb 5, 2025

The Upgrade of LUTS does not happen very often (first time)
The dynamic change was explored by @will-moore but due to the rarity of the upgrade and do not see the synch release a major burden
The situation has improved since there is now no hard-coded list of items in web apps.
the json file also ensures that the visual representation matches the correct lut

Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the LUT upgrade is not frequent and there is so much time we want to invest into the dynamic aspect. Especially if the next OMERO.web release will also include the upgraded JSON with the LUTs.

I was primaril asking as the dynamic loading is implemented in the /webgateway/luts&rgb=true and this endpoint could possibly be used rather than loading the cache in /webgateway/luts_png.

Performance-wise, comparing the timings of https://merge-ci.openmicroscopy.org/web/webgateway/luts/?rgb=true to https://merge-ci.openmicroscopy.org/web/static/webgateway/json/luts.json, the former call responds with a couple of 100 additional milliseconds (~574ms vs. 257ms). So there is an open question of which metrics is acceptable.

At the API level, the new rgb=true parameters in the /webgateway/luts/ endpoint are backwards compatible. However, as described in the PR, the behavior of /webgateway/luts_png/ is completely modified. At minimum, the docstring should be updated to reflect the new expectation. This also raises the question of whether these changes are significant enough that they should be considered as backwards incompatible .
Testing wise, the previous implementation suggests updates to the LUT name/path would be reflected in these endpoints. Is this also true in the new implementation and should this be tested?

)
# rgb = load_lut_to_rgb(conn, lut.id.val)
lut_data = {
"id": lut.id.val,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LUT ID might vary from server to server. This is not a problem per se especially as you are using path+name for normalizing but this means there is a mismatch between the cached JSON file and this endpoint - which can be seen for instance by comparing https://merge-ci.openmicroscopy.org/web/webgateway/luts/ with https://merge-ci.openmicroscopy.org/web/static/webgateway/json/luts.json

Should id not omitted in the cache?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes ids will change, but are harmless. If we want to allow the easy copying of /webgateway/luts/?rgb=true into the static luts.json but ALSO exclude ids, then maybe we need another parameter ?ids=false. Easy to do if worth it?

README.rst Outdated
LUTs caching
------------

The OMERO server ships with a set of look-up tables (LUTs) for rendering images. Users can also
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admin can add, not users since it is treated as a script

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 961d307

@will-moore
Copy link
Member Author

Thanks for the in-depth reviews...

Has there been any consideration about hybrid solution where the cached JSON would be retrieved and only the missing LUTs would be dynamically fetched from the server when retrieving the PNG representation?

I actually think this would make a lot of sense. The performance issues we saw previously were because we were fetching all 47 LUTs from the server, whereas if you're only fetching 1 or 2 then this won't be a problem. This would mean the behaviour of /webgateway/luts_png/ would be unchanged (always renders all LUTs that are on the server).

The 574ms timing for https://merge-ci.openmicroscopy.org/web/webgateway/luts/?rgb=true above includes loading ~8 LUTs from the server (which are not yet in the static luts.json).

We could consider re-enabling the caching of the /luts_png/, for those rare times when LUTs are added (but it wouldn't be essential for users to enable caching).

Updates to LUTs name/path will be reflected in the /webgateway/luts json. Testing would be nice but is not so easy and if there is an issue then it's likely due to the OMERO api since all we're going here is converting that output to JSON.

@jburel
Copy link
Member

jburel commented Feb 5, 2025

Updates to LUTs name/path will be reflected in the /webgateway/luts json. Testing would be nice but is not so easy and if there is an issue then it's likely due to the OMERO api since all we're going here is converting that output to JSON.

A test focusing on the OMERO API call will help in that case

@will-moore
Copy link
Member Author

will-moore commented Feb 6, 2025

I think adding tests for the OMERO API (if they don't already exist) is probably outside the scope of this PR.
I'm not exactly sure how a test would check that the LUTs listed by the script service match what's on the server?

To focus on what needs to be done on this PR:

  • should I update the webgateway/luts_png/ to load the non-cached LUTs from the server (as we do for /luts/?rgb=true JSON)?
  • should I add back Django caching for webgateway/luts_png/ as before this PR (for the times when it could be useful)?

EDIT: discussed 7th Feb web meeting - "Yes" to both those questions...

@will-moore
Copy link
Member Author

To test the performance of loading 8 LUTs from the server, compare response times for /webgateway/luts_png/?cached=false (no LUTs loaded from server) with /webgateway/luts_png/?cached=false&new=true.

NB: we always use ?cached=false so we don't get cached response. Without that, it should be even faster.

@will-moore
Copy link
Member Author

will-moore commented Feb 12, 2025

Tested on merge-ci by alternatively running these 2 commands in the webclient Devtools Console, then checking the response times in the Network tab:

fetch("https://merge-ci.openmicroscopy.org/web/webgateway/luts_png/?cached=false")
fetch("https://merge-ci.openmicroscopy.org/web/webgateway/luts_png/?cached=false&new=true")

The &new=true call, when we are loading 8 LUTs from the server takes about 400-500ms server response time (not including the SSL time, connection time etc) compared to about 200ms for the other call, when we're not loading LUTs from the server.

To err on the side of safety, I suggest:

  • By default we DON'T load LUTs from the server.
  • If a LUT is added to the server, it won't initially show up in the webgateway/luts_png/ (white placeholder instead).
  • If the server has cache enabled, then a single visit to webgateway/luts_png/?cached=false&new=true (E.g by an Admin) will be sufficient to load the LUTs once and populate the Django cache with the complete png, so that all other users will benefit.

This is actually the current behaviour. So if this sounds good, I can update the README to explain this workflow then we should be good to go?

@will-moore
Copy link
Member Author

I'll add the "new" LUTs into the static file now...

Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR tries to fix the LUT retrieval in two separate deployments configuration:

  1. if no OMERO.web cache backend is configured (via omero.web.caches), the implementation will always load the static JSON file and return a preview which might include white gaps if additional LUTs are present server-side unless new=true is passed. This is consistent with my expectation i.e. the implementation should avoid reloading additional LUTs for every single request to the endpoint
  2. if an OMERO.web cache backend is configured e.g. Redis
    • if a cache entry with a key hashed from the list of LUTs exists and cached=False is not set, it is returned as a response
    • otherwise, as above the static JSON file is read and used to generate a preview image. White gaps might be present if there are more LUTs server-side than in the static JSON and new=true is not passed
    • the preview image is then stored in the cache

My primary issue is associated with the second scenario and the impact of the double cache and the issues associated with troubleshooting and invalidating such a cache. One example of complex scenario:

  • LUTs are added server-side e.g. via a new OMERO.server release
  • OMERO.web is either not yet released or not deployed with an matching LUTs JSON file
  • the first call to the LUT preview endpoint finds no cache associated with the new hash. The implementation regenerates a preview image including white gaps and stored in the cache
  • every subsequent call to the endpoint will read the cached preview image with white gaps even if OMERO.web is upgraded with a new JSON including the new LUTs
  • the only way to fix this situation would be to invalidate the cache

Is there a way to detect such a scenario and force the LUT loading as a one-off?

In general, the docstrings should be updated to describe the new logic has changed. The README addition is useful but as this is documenting an occasional workflow including possibly server-side change, I think it would be more relevant under https://omero.readthedocs.io/en/stable/.

if pathname in luts_by_pathname:
lut_rgb = luts_by_pathname[pathname].get("rgb")
new_img[(i * 10) : ((i + 1) * 10), :, :3] = lut_rgb
elif request.GET.get("new") == "true":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unclear on the distinction of the cached and new query parameters. In particular, I would expect that calling cached=false would also new=true. Is there a scenario where we would like to have different combinations?

@will-moore
Copy link
Member Author

@sbesson If the luts_png wasn't saved to Django cache unless ?new=true then I think that would address the issues with your 2nd scenario? There's really no point in caching the png that has whitespaces (and is only generated from the static luts.json).

I also think you're right about cached=false and new=true always going together. So I'll drop the new=true and just use cached=false.

So, the luts_png call has 2 behaviours:

  • cached=false - Ignore Django cache, load LUTs from the server if not in static luts.json and save the png in the Django cache
  • otherwise: Use the Django cache if it exists, and if not then generate the png from the static luts.json.

@will-moore
Copy link
Member Author

@sbesson That commit 3bfb3fc should address your points.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants