Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Position-aware caching #3578

Closed
fedorov opened this issue Aug 2, 2023 · 13 comments · Fixed by #3663
Closed

[Feature Request] Position-aware caching #3578

fedorov opened this issue Aug 2, 2023 · 13 comments · Fixed by #3663
Labels
IDC:candidate Possible feature requests for discussion, and if we agree, they can be relabeled IDC:priority

Comments

@fedorov
Copy link
Member

fedorov commented Aug 2, 2023

What feature or change would you like to see made?

Implement caching strategy that is aware of the position of the currently visualized image series.

Why should we prioritize this feature?

This is follow up on the discussion in #3082 (comment).

Current caching approach is inadequate in the situations where the entire series does not fit in cache. In this case, OHIF will keep thrashing the cache without any value to the user. This can be seen for this large XC series: https://ohif-v3.web.app/viewer?StudyInstanceUIDs=1.3.6.1.4.1.5962.1.2.0.1677425356.1733 (restricted access).

Cache size is capped at 1 GB, and the screenshot below shows the network behavior when opening a series that exceeds 1 GB - in this situation, current implementation will seemingly infinitely keep replacing the content of cache, oblivious to the fact that it is impossible to cache the entire series in 1 GB.

image

Position-aware caching makes sense in a general use case, beyond this admittedly unusual study. Caching slices starting around the current position of the slider is probably a good idea in general to optimize user experience. I am pretty sure Nucleus viewer (the one that shows the content of cache as highlighted sections of the scroll bar) prioritizes caching around the active slice.

Also, even though cache is 1 GB, we should not forget about multi-modal studies where one can have multiple series in the layout, longitudinal studies, plus memory needs of segmentations and overlays. I think it is very likely that you will run into a situation where it will be impossible to cache the entire series for each of the series visualized much sooner than you thought, and position-aware caching should be able to help with that.

@fedorov fedorov added the IDC:candidate Possible feature requests for discussion, and if we agree, they can be relabeled IDC:priority label Aug 2, 2023
@sedghi
Copy link
Member

sedghi commented Aug 2, 2023

Putting my other comment there here too

So i tried the link and this is what is happening. It is very interesting and funny problem kind of

Re first render aka metadata call

  • the server is not responding fast at all it has 12 seconds delay and there is another metadata call which is 20 seconds

image

Re image loading slow ness

The current cache for cornerstone is set at 1GB which is around 200 images of 4.5MB each. And what happens is that we are sending out 25 simultaneous request (at max) and that cache gets full in ~8 seconds, so from that time we are overwriting the old cached images by the new ones that are returned. Not only that when you move to a slice say slice 20 we have "smart (meh)" prefetching which grabs the before and after and guess what those images (18, 19, 21, 22 etc.) are not in the cache either so it is kind of a recursive calls of images ovverriding prev cached images.

and also each image itself has 4 seconds for the server to respond too

image

So sum up our caching and your server issues and you see that bad viewer experience. You can certainly make your server faster but for caching I'm not sure what else we can do. Seems like the caching needs to know the current state of the viewer (viewing slice 20, so don't let 10-30 to be overwritten).

@wayfarer3130
Copy link
Contributor

Consider running a Static DICOMweb instance storing the metadata and/or image as gzipped data. That would allow you to respond to both metadata and image data responses sub-second.
Also, is the 4.5 MB uncompressed or compressed? It makes a huge difference in how many are cacheable. Something like HTJ2K would allow a small cache size of reduced images first and then wait for the remaining data on view.

@fedorov
Copy link
Member Author

fedorov commented Aug 3, 2023

Any solution to this issue that would require deviation from the DICOM standard in how the data should be represented is unacceptable for us in IDC.

@pieper
Copy link
Member

pieper commented Aug 3, 2023

Any solution to this issue that would require deviation from the DICOM standard in how the data should be represented is unacceptable for us in IDC.

From what I understand everything @wayfarer3130 suggested would be completely standard. Is that true @wayfarer3130?

@wayfarer3130
Copy link
Contributor

The Static DICOMweb suggestion is DICOM standards based, it is just a much more performant solution for serving up the data than having the data be generated as required.

Also, I have a fix for the position aware caching - could you test it in your context?
cornerstonejs/cornerstone3D#726

@fedorov
Copy link
Member Author

fedorov commented Aug 4, 2023

Also, I have a fix for the position aware caching - could you test it in your context?

Sounds great! I am not an OHIF developer, and I do not have a dev setup to test this. @igoroctaviano or @rodrigobasilio2022 you could take a look and test with the VHD sample?

@sedghi
Copy link
Member

sedghi commented Aug 11, 2023

I was chatting with Bill we have about this feature, we are thinking

  • It should be able to deal with very large image stacks
  • It should cache the images that are nearby for the viewports in the grid, for instance, if there are 4 viewports (2x2)
    there are 4 streams of nearby caching (-1 to +6) for the current position of the images in each viewport

From here there are two scenarios

  1. no further requests until user interacts with images via scrolling (navigating)

  2. for the active viewport, we run the rest of requests until full stack is cached (or at least tried to be cached)

Above scenarios should consider NOT overwriting the viewport grid's current image position (+- some slice) in the cache. So it means if there is caching happening for the rest of the stack (either when navigating, option 1 above, or automatically, option2), it SHOULD NOT overwrite the other viewports current (+-) images in the cache if it is full

@pieper @fedorov Tell us which option above makes more sense or if you have any other ideas we are hear to listen

@fedorov
Copy link
Member Author

fedorov commented Aug 11, 2023

We discussed this today with @dclunie @pieper and @igoroctaviano. Our overall response is that we don't particularly care about the implementation details, and the implementation should make a decision which heuristic to implement based on the situation (frame size, metadata, number of frames) to provide satisfactory experience.

We also discussed that you might want to consider caching compressed frames (when the specific situation warrants such a heuristic).

@sedghi
Copy link
Member

sedghi commented Aug 11, 2023

it is not implementation detail rather what user expects.

@igoroctaviano
Copy link
Contributor

it is not implementation detail rather what user expects.

We meant that for scenarios like visible human datasets (or other big frame datasets), the prefetching should be disabled (option 1: no further requests until the user interacts with images via scrolling). So this way the characteristics of data (through metadata analysis) could decide which heuristic to load the images. And the same is true for storing compressed or uncompressed images in the cache e.g. big images could be stored compressed and other images uncompressed so this way we could have more room in the cache for those big frame datasets. A fetching heuristic would be picked as soon as the user selects a display set. Having a global config wouldn't help for these scenarios, for some lightweight datasets makes sense to just fetch the whole stack.

@wayfarer3130
Copy link
Contributor

it is not implementation detail rather what user expects.

We meant that for scenarios like visible human datasets (or other big frame datasets), the prefetching should be disabled (option 1: no further requests until the user interacts with images via scrolling). So this way the characteristics of data (through metadata analysis) could decide which heuristic to load the images. And the same is true for storing compressed or uncompressed images in the cache e.g. big images could be stored compressed and other images uncompressed so this way we could have more room in the cache for those big frame datasets. A fetching heuristic would be picked as soon as the user selects a display set. Having a global config wouldn't help for these scenarios, for some lightweight datasets makes sense to just fetch the whole stack.

The implementation added here is trying to react to the user interaction in a way that should mostly work to improve responsiveness:

  • If the user clicks around/jumps between distance images, only a few images get cached nearby, so that it does not make too many requests
  • If the user scrolls through the series, then more cache is used to fetch ahead of the user, so that hopefully images are ready by the time they reach the images.

For the visible human dataset, on a fast back end, this allows for retrieval speeds of about 12-15 fps. I'm guessing that it will only allow for 1-2 fps against the IDC back end, as that one is far slower. That is the speed I get when using JPEG 2000 compression (because the decompression is VERY slow, causing it to use 100% CPU on a fast machine). Suggest using HTJ2K if compression is desired.

There aren't any changes to the type of image cached in terms of compressed versus uncompressed. If compressed image storage is desired, then I would recommend setting the HTTP headers for caching and let the browser handle the compressed image caching, and leave the internal uncompressed image cache alone (which you still need to prepare ahead of time because decompression of large images takes a while).

OHIF overall still has a stability issue causing the browser to crash if you CINE loop the entire dataset.

@fedorov
Copy link
Member Author

fedorov commented Aug 22, 2023

I'm guessing that it will only allow for 1-2 fps against the IDC back end, as that one is far slower. That is the speed I get when using JPEG 2000 compression (because the decompression is VERY slow, causing it to use 100% CPU on a fast machine). Suggest using HTJ2K if compression is desired.

"IDC back end" is Google Healthcare DICOM store + IDC proxy.

Google Healthcare DICOM stores do not support HTJ2K - see https://cloud.google.com/healthcare-api/docs/dicom#supported_transfer_syntaxes_for_transcoding - and I do not have any insight into if or when it might be supported in the future, so this is not an option for IDC, at least not for the near future.

@sedghi
Copy link
Member

sedghi commented Oct 3, 2023

We merged this into master, and now caching is smarter. Please re-open if the issue persists

@sedghi sedghi closed this as completed Oct 3, 2023
@sedghi sedghi linked a pull request Oct 3, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IDC:candidate Possible feature requests for discussion, and if we agree, they can be relabeled IDC:priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants