Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simply download the OriginalFiles #264

Merged
merged 6 commits into from
Feb 18, 2022
Merged

Simply download the OriginalFiles #264

merged 6 commits into from
Feb 18, 2022

Conversation

dominikl
Copy link
Member

@dominikl dominikl commented Dec 10, 2021

Fixes #188, successor of closed PR #217.

I'm very tempted to just keep it simple like this @jburel , ie. with this PR the download option will simply download the OriginalFiles and their directory structure. It won't create a separate directory or rename any files. If a file already exists, it will be skipped. If we make it more complex again, we'll run into all sorts of issues with special cases again.

/cc @ehrenfeu @CellKai

(Note: Haven't tested it on Windows yet Tested on Windows now too, incl. the latest commit)

@pwalczysko
Copy link
Member

Deleted the comment, as I was able to download and unzip on subsequent attemps.

@pwalczysko
Copy link
Member

Test 1:

  1. Download one Dicom image from https://outreach.openmicroscopy.org/webclient/?show=image-13402
  2. Reupload the downloaded image back to OMERO.

Result:
The image downloads as expected (it is a 3 image MIF, and the three images are downloaded locally).
Nevertheless, when the whole downloaded folder is reimported to OMERO, although the result are again 3 images, they do not appear to be multiimage file anymore, instead, for example, a Move into another group of only one of those images is possible.
Imho, this is a test fail.

@dominikl
Copy link
Member Author

This also happens with the current Insight without this PR. Tbh, if download of the original files and reimport splits the multiimage Dicom image, than the problem is probably Dicom specific and lies deeper. Download and reimport of a vsi image for example works as expected with this PR.

@pwalczysko
Copy link
Member

Test 2:

  1. Download a vsi, using the insight from the present PR, namely https://outreach.openmicroscopy.org/webclient/?show=image-94204
  2. Reimport it again to OMERO as https://outreach.openmicroscopy.org/webclient/?show=image-94208

Result: The downloaded folder structure seems reasonable, see screenshot below.
When the vsi file is picked and reimported to OMERO, the viewing in OMERO is as expected, and it is a MIF as expected.
Screenshot 2021-12-13 at 16 45 40

This test passes imho.

@pwalczysko
Copy link
Member

This also happens with the current Insight without this PR. Tbh, if download of the original files and reimport splits the multiimage Dicom image, than the problem is probably Dicom specific and lies deeper. Download and reimport of a vsi image for example works as expected with this PR.

Thank you @dominikl - confirming that with my release insight. The Dicom behaviour is not a regression, and has nothing to do with this PR.

@joshmoore
Copy link
Member

If a file already exists, it will be skipped.

Would it be easy enough to expand this to "file or directory"? i.e. if anything already exists, you skip? I don't have a working example of how to trigger this, but I worried of downloading:

foo/my.ini

into a directory which already contains:

foo/my.fake

and thereby change the interpretation of something.

@ehrenfeu
Copy link

Hi there,

quick and hopefully not so stupid question... Should the changes from above also be included in omero-py available from Jenkins? I'd be curious to test them.

Thanks,
Niko

@pwalczysko
Copy link
Member

Hi there,

quick and hopefully not so stupid question... Should the changes from above also be included in omero-py available from Jenkins? I'd be curious to test them.

Thanks, Niko

Hi Niko,

The changes are certainly available on https://merge-ci.openmicroscopy.org/jenkins/job/OMERO-insight-build/ - just download the latest artifact of insight, that is what I am doing when testing. Not sure about omero-py, leave @dominikl to answer that.

@dominikl
Copy link
Member Author

As Petr said, the changes are included in the current merge build of Insight. But they won't go into omero-py, that's totally unrelated to Insight.

@dominikl
Copy link
Member Author

Last commit should address @joshmoore 's issue. Would be good to have some complex filesets (many files, many directories) to properly test this PR. What's a good image format for that?

@pwalczysko
Copy link
Member

pwalczysko commented Dec 15, 2021

Last commit should address @joshmoore 's issue. Would be good to have some complex filesets (many files, many directories) to properly test this PR. What's a good image format for that?

Tested so far

  • vsi
  • ndpi
  • svs
  • Dicom (does not work, but not a regression)

The behaviour was as expected. @dominikl to find one complex file format which will test it all is possibly not realistic ? Could we specify (with examples) what the "if the file already exists, it will be skipped" means, or, if that changed as a result of your last commit, how exactly and what is the expectation ? Only based on that we can set up some testing cases.

@ehrenfeu
Copy link

As Petr said, the changes are included in the current merge build of Insight. But they won't go into omero-py, that's totally unrelated to Insight.

Thanks Dominik. I somehow thought omero-py is calling the Java code when downloading files, but I should actually know better...

@will-moore
Copy link
Member

@ehrenfeu See https://forum.image.sc/t/download-full-projects-with-their-datasets/56434/22 and discussion there for downloading via python. Improvements were made in omero-py 5.10.0.

@ehrenfeu
Copy link

Ah, very nice - thanks a lot @will-moore !

I was asking for a friend working on the HRM-OMERO connector 😁 but he already found out that he should just properly deal with the paths when downloading filesets from OMERO.

Nevertheless I will look into your code. My parts in the connector are a bit aged by now, they could certainly use some polishing.

@sbesson
Copy link
Member

sbesson commented Dec 16, 2021

Cross-linking to ome/omero-py#298 which added the Python improvements that @will-moore was referring too. This thread in particular includes many considerations around data integrity and data corruption that are directly relevant to this use case.

Adding to the similarities, the former CLI download allowed the user to specify the name of the output file breaking the assumptions of the file readers when doing a round-trip. This is effectively another variant of #188.

The agreement made in OMERO.py was to:

  1. keep the structure of the original files (names & folder layout) unchanged
  2. enforce the containerisation of a fileset under a top-level folder.

Talking with @pwalczysko @dominikl @will-moore and @khaledk2, I would strongly vote for using the occasion and be consistent in terms of the strategy across our download utilities.

Probably the biggest differences between omero download and OMERO.insight download are that OMERO.insight supports the export of multiple filesets and also that it does not allow control of the folder name by the user. In general, I think a export structure of type:

fileset1_folder/
   fileset1_originalfile_1
   fileset1_originalfile_2
   ...
   fileset1_originalfolder_1/
   ...
 fileset2_folder/
   fileset2_originalfile_1
   fileset2_originalfile_2
   ...
   fileset2_originalfolder_1/
   ...

is the one that is the most amenable to ensure the integrity of the original data and reduce the risks of cross-talk between filesets. In a nutshell, this is a small version of the ManagedRepository layout and matches the layout used in omero-downloader.

Probably the most important design decision for the layout above is the naming of these top-level fileset folders. There are a range of available options which offer trade-offs in terms of user-friendliness vs potential for interference with existing/other fileset folders.

@joshmoore
Copy link
Member

cc: @sukunis in case there are any cross-interactions with openlink.

values.put(fs.getOriginalFile(), set);
for (Object tmp : files) {
Fileset fs = (Fileset) tmp;
File filesetDir = new File(dir.getAbsolutePath()+File.separator+"Fileset_"+fs.getId().getValue());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method like Files.createDirectories could be used here instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't really make it simpler, for Files.createDirectory I'd need to create a Path object instead of a File.

@pwalczysko
Copy link
Member

Testing the behaviour of the repeated dowload of the same file into the same folder.
I have in Activities a message:
No image downloaded. Check logs for details.
Then in the log I have
2022-01-10 13:47:47,939 DEBUG [ o.o.shoola.env.data.OMEROGateway] ( Thread-46) /Users/pwalczysko/10Jan-test-download/Fileset_97307/Brain_DAPI_GFAP-488_IBA3-Cy3_MBP-647_coronal10x_coronal-slideA_03.vsi already exists.
This makes sense and is as expected, but could we not have the main message saying ..already exists in the Activities ? This would spare me having to go into the log - I have to know where the log is etc...

@pwalczysko
Copy link
Member

pwalczysko commented Jan 10, 2022

Tested so far

  • vsi (also reimport)
  • ndpi
  • svs
  • Dicom (does not work in the sense that the MIF is decoupled after reimport, but not a regression)

The above went fine.

I think that the most logical to try now would be a plate download, but this si not enabled on merge-ci. What is the present workflow to change the config on that server please @sbesson ?

@pwalczysko
Copy link
Member

pwalczysko commented Jan 11, 2022

Downloading https://merge-ci.openmicroscopy.org/web/webclient/?show=plate-4256 and https://merge-ci.openmicroscopy.org/web/webclient/?show=plate-2955 now.

The https://merge-ci.openmicroscopy.org/web/webclient/?show=plate-3855 is ending up being many single Filesets, possibly because it is a result of Dataset to Plate script ?
Screenshot 2022-01-11 at 12 48 50

@ehrenfeu
Copy link

In case you're interested, I have created a very small (~200 kB) multifile VSI dataset using the original Olympus cellSens software here at our facility. My purpose is to use it for automated testing of the HRM-OMERO connector, hence I tried to have it as small as possible while keeping the multifile structure (once it gets too small, cellSens puts everything into a single .vsi file without any subfolders).

Let me know if I should upload it somewhere!

@dominikl
Copy link
Member Author

@pwalczysko Unfortunately I can only log the error if the file already exists. At the moment it's not possible to pass an error message up to the UI level. That would need a lot of changes to the Insight code.

@pwalczysko
Copy link
Member

@pwalczysko
Copy link
Member

Any ideas for further testing formats @sbesson ?

Tested so far

  • vsi (also reimport)
  • ndpi
  • svs
  • Dicom (does not work in the sense that the MIF is decoupled after reimport, but not a regression)
  • xdce extension (Plate)
  • Mias Frans (Plate, I think this is Flex ?)

@joshmoore
Copy link
Member

#264 (comment) possibly because it is a result of Dataset to Plate script ?

That would make sense, @pwalczysko.

@sbesson
Copy link
Member

sbesson commented Jan 13, 2022

The list of formats tested in #264 (comment) should cover a wide-range of scenarios in different imaging domains (HCS, pathology). I assume NDPI included the NPDIS case (i.e. a pathology fileset composed of multiple NDPI files)?
https://docs.openmicroscopy.org/bio-formats/6.8.0/formats/dataset-table.html gives the full list of supported formats including the expectations in terms of dataset structure. The only other thing I can think of would be a multi-file format in the fluorescence microscopy domain. Deltavision, Zeiss LSM and/or Olympus FV1000 might be good examples.

@pwalczysko
Copy link
Member

pwalczysko commented Jan 13, 2022

@pwalczysko
Copy link
Member

In case you're interested, I have created a very small (~200 kB) multifile VSI dataset using the original Olympus cellSens software here at our facility. My purpose is to use it for automated testing of the HRM-OMERO connector, hence I tried to have it as small as possible while keeping the multifile structure (once it gets too small, cellSens puts everything into a single .vsi file without any subfolders).

Let me know if I should upload it somewhere!

Hi Niko,
Thank you very much for this. Would you be able to upload this file and make it publicly visible ? If you do not have a location, could you possibly use Zenodo ?
Thanks a lot
Petr

@pwalczysko
Copy link
Member

pwalczysko commented Jan 13, 2022

zeiss-lsm see https://merge-ci.openmicroscopy.org/web/webclient/?show=image-210326
deltavision with a log file https://merge-ci.openmicroscopy.org/web/webclient/?show=image-210329
fv1000 (oif with multi tiffs to it) https://merge-ci.openmicroscopy.org/web/webclient/?show=image-210331

the three formats as above successfully downloaded and reimported. All MIFs and looked good after download and also after reimport.

This would be the coverage as suggested by comment #264 (comment)

We have now

  • vsi (also reimport)
  • ndpi and ndpis, reimported
  • svs
  • Dicom (does not work in the sense that the MIF is decoupled after reimport, but not a regression)
  • xdce extension (Plate), with reimport
  • Mias Frans (Plate, I think this is Flex ?), with reimport
  • fv1000, multifile oif variant, reimported
  • Deltavision with log, reimported
  • zeiss-lsm (multifile version with mdb), also reimported

As all files and functional tests passed and the coverage seems rich enough, ready to merge fmpov.
Happy to add more tests if someone comes with more format ideas, otherwise all good.

@sbesson
Copy link
Member

sbesson commented Jan 14, 2022

Nothing else from my side. We might want to wait on the data from #264 (comment) to conduct a final test. Otherwise, I assume we are in a state where we should release OMERO.insight.

@ehrenfeu
Copy link

Hi @pwalczysko

here you go: https://doi.org/10.5281/zenodo.5848987

Thanks a lot!
~Niko

@pwalczysko
Copy link
Member

@ehrenfeu Thanks a lot Niko !

Downloaded, imported to OMERO and again downloaded from OMERO using insight from this PR.
All went fine, the set consists of 2 images in OMERO, a 2-channel image with 11 z sections and other parameters matching the README of your file in Zenodo. The second image is Macro image. See screenshot below documenting how the hierarchy looks like on my local Mac after download and how the images look in OMERO.iviewer after reimport. I cannot detect any problem there, all works and matches the images pre-download.

Screenshot 2022-01-14 at 12 01 27

Screenshot 2022-01-14 at 12 04 11

@joshmoore
Copy link
Member

cc: @erickmartins since he may have something similar in his upcoming transfer tools

https://twitter.com/erickratamero/status/1482060699539058688

@erickmartins
Copy link

We live in a python/ezomero world, so our strategy is to use our get_original_filepaths ezomero function (which HQLs for usedFiles in a Fileset) and rsync via subprocess maintaining the internal ManagedRepository directory structure. In our experience it has dealt with all cases pretty cleanly. If fileset downloading is included in omero-py we could potentially lose the rsync dependency and do it that way.

(we're probably making our transfer thing public this week. the repo is still private because we didn't want a half-half-baked solution in the hands of people, and because we still need a catchier name than jax-omero-transfer :) )

@jburel jburel merged commit 3631862 into ome:master Feb 18, 2022
@ehrenfeu
Copy link

@pwalczysko @jburel sorry if I'm repeating myself, I feel like having asked this question before... Is there an automatic build of insight available somewhere that includes this merge?

@jburel
Copy link
Member

jburel commented Feb 18, 2022

I am preparing a release. It should be out next week. But you can try the daily build if you want. It is at https://merge-ci.openmicroscopy.org/jenkins/job/OMERO-insight-build/. That builds does not have the dmg, exe so it is very much for testing purpose

@ehrenfeu
Copy link

Thanks J-M, highly appreciated!

@jburel
Copy link
Member

jburel commented Feb 21, 2022

@ehrenfeu https://github.com/ome/omero-insight/releases/tag/v5.7.0 containing this PR is now available

@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/download-vsi-files-from-omero-fails/83314/2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Olympus .vsi download bug
9 participants