Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support omero download Fileset:ID #298

Merged
merged 6 commits into from
Sep 1, 2021

Conversation

will-moore
Copy link
Member

@will-moore will-moore commented Aug 18, 2021

See https://forum.image.sc/t/download-full-projects-with-their-datasets/56434/

This adds support for:

$ omero download Fileset:ID download_dir

Files are downloaded to the named directory. If original files have a path that is longer than the fileset template prefix,
that path will be used to create additional directories inside the named directory.

The same behaviour is used for Images (this fixes the previous limitation of only a single-file for an Image):

$ omero download Image:ID download_dir

Other functionality to test for regressions:

$ omero download OriginalFile:ID download_filename
# assumes OriginalFile:
$ omero download ID download_filename
$ omero download FileAnnotation:ID download_filename

Copy link
Member

@joshmoore joshmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for kicking this off, @will-moore. One concern I had thinking about this yesterday is if this won't end up becoming a full omero-downloader. One way to balance that that I was considering was not to introduce Project & Dataset explicitly, but to let "filename" become the target download. Users would produce a list or mapping of project/dataset paths to image ids:

# Legacy: download the single file for an image (this makes things tricky)
omero download Image:1 output.tif

# Download single or multiple file to directory if it exists
omero download Image:1 output/

# Usage for Project/Dataset hierarchies, creating directories
omero download Image:1 --parent my-project/my-dataset/

One thing we would probably want to re-open would be Colin's work on printing these hierarchies: ome/openmicroscopy#4777


Edit: One other issue that these strategies will have is that they may download the same original files multiple times. I suggest we ignore that.

src/omero/plugins/download.py Outdated Show resolved Hide resolved
src/omero/plugins/download.py Outdated Show resolved Hide resolved
@will-moore
Copy link
Member Author

Doing:
$ omero download Image:1 --parent my-project/my-dataset/
isn't going to be very nice for the user above who "would like to download multiple (more than 100) whole projects while maintaining their dataset structure".

Is the main objection to $ omero download Project:ID that it might accidentally overwrite existing files (which is easy enough to fix) or that the logic doesn't handle edge-cases: e.g.

  • 2 Datasets named the same, or 2 images in the same Dataset have original files with identical names?
  • Projects or Datasets with / in their name
  • Download files multiple times if the Images are in multiple Datasets
  • No limit on download size (could be many GB and very many files)

If the code as it is will work for the user who needs it, but we don't want to add it and maintain it here, then it could be a separate user script (gist) instead?

@will-moore
Copy link
Member Author

As discussed this morning, added support for:

$ omero download Fileset:ID download_dir

This is now used by:

$ omero download Image:ID download_dir

PR description updated

@will-moore will-moore changed the title Support omero download Project:ID and Dataset:ID Support omero download Fileset:ID Aug 18, 2021
@joshmoore
Copy link
Member

joshmoore commented Aug 18, 2021

Initially testing looks good on the Fileset: usage. 👍 If we end up sending more people to this tool, I imagine something like the ProgressBars from glencoesoftware/bioformats2raw#83 might be useful, which might end up driving us to a framework like https://click.palletsprojects.com/en/8.0.x/

One other nice to have might be to clean up the file on cancellation and/or support restart, but that's getting a bit ahead of ourselves for now.

Edit: https://github.com/willmcgugan/rich was the other library for nicer CLIs (see #217)

@will-moore
Copy link
Member Author

This is failing a bunch of tests for various reasons:

  • Using - output doesn't output to stdout - Need to fix this PR
  • Images without filesets are not handled - Need to fix in this PR
  • Images with multiples files (e.g. companion files) are now supported. - Update Tests
  • omero download Image:ID test previously saved a single file named test. Now it downloads the Image file(s) with their original names into a test/ directory. - Update Tests

@pwalczysko
Copy link
Member

Could the -h mesage of the command be updated for the new features too ?

@pwalczysko
Copy link
Member

Tested images:

omero download Image:ID download_dir

Download only
lif:
https://nightshade.openmicroscopy.org/webclient/?show=image-4471884

ndpi
https://outreach.openmicroscopy.org/webclient/?show=image-51066

Download and reimport
DICOM
(the below link points to the reimported image, as it is actually performing better than the original which I successfully downloaded)
https://outreach.openmicroscopy.org/webclient/?show=image-59527

vsi:
https://demo.openmicroscopy.org/webclient/?show=image-149042
reimported as https://demo.openmicroscopy.org/webclient/?show=image-182901 - is okay, except for thumbs still not generated after long time

All went fine

@pwalczysko
Copy link
Member

Bug:
the
omero download Image:4409956 bread command fails because that image has no fileset, as it is coming from OMERO.figure (figure saved as a new image).
Is it possible to let this fail more gracefully ?

Traceback (most recent call last):
  File "/Users/pwalczysko/miniconda3/envs/test-will/bin/omero", line 8, in <module>
    sys.exit(main())
  File "/Users/pwalczysko/miniconda3/envs/test-will/lib/python3.6/site-packages/omero/main.py", line 125, in main
    rv = omero.cli.argv()
  File "/Users/pwalczysko/miniconda3/envs/test-will/lib/python3.6/site-packages/omero/cli.py", line 1784, in argv
    cli.invoke(args[1:])
  File "/Users/pwalczysko/miniconda3/envs/test-will/lib/python3.6/site-packages/omero/cli.py", line 1222, in invoke
    stop = self.onecmd(line, previous_args)
  File "/Users/pwalczysko/miniconda3/envs/test-will/lib/python3.6/site-packages/omero/cli.py", line 1299, in onecmd
    self.execute(line, previous_args)
  File "/Users/pwalczysko/miniconda3/envs/test-will/lib/python3.6/site-packages/omero/cli.py", line 1381, in execute
    args.func(args)
  File "/Users/pwalczysko/miniconda3/envs/test-will/lib/python3.6/site-packages/omero/plugins/download.py", line 83, in __call__
    self.download_fileset(conn, image.getFileset(), args.filename)
  File "/Users/pwalczysko/miniconda3/envs/test-will/lib/python3.6/site-packages/omero/plugins/download.py", line 91, in download_fileset
    self.ctx.out(f"Fileset: {fileset.id}")
AttributeError: 'NoneType' object has no attribute 'id

@will-moore
Copy link
Member Author

@pwalczysko That "bug" should be fixed by eeed465 to fail as it did before, which is a bit more graceful.

@pwalczysko
Copy link
Member

@pwalczysko That "bug" should be fixed by eeed465 to fail as it did before, which is a bit more graceful.

thanks, I will rebuild and retry

@pwalczysko
Copy link
Member

Indeed, now the failure is much nicer

omero download Image:4409956 bread
Using session for pwalczysko@nightshade.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Swedlow Lab
Input image has no associated Fileset

@pwalczysko
Copy link
Member

In summary, tested all 5 commands in the header of this PR on different file formats. All went okay.
Ready to merge fmpov.

@will-moore
Copy link
Member Author

will-moore commented Aug 20, 2021

@will-moore
Copy link
Member Author

Tests passing (I added one to test download of Fileset).
I assume we don't need to wait on user at https://forum.image.sc/t/download-full-projects-with-their-datasets/56434/ before merging this?

@sbesson
Copy link
Member

sbesson commented Aug 25, 2021

Thanks @will-moore, overall I am very much in favor of releasing features added in this PR as they lift a long-standing limitation of the plugin when downloading original files associated with images. Allowing complete fileset download has the classical pitfalls associated large download that we have seen previously. I assume this feature respects the policy restrictions set up server-side?

As suggested by the description of this PR and is more clearly exposed in the accompanying integration tests, this PR effectively changes the semantics for a single file image download. Previously omero download Image:1 foo would create a file called foo with the content of the original file. With this PR, a foo folder will be created and contain the original file(s).

I have been pondering about the breaking nature or not of this API change and I am increasingly inclined to qualify the previous behavior as incorrect and prone to data corruption. This is related to the fact that the command gives full control over the target filename which is for instance at odds with the fact that the original file extension can be a requirement for Bio-Formats and other translational libraries. Taking the FITS format as a representative example , with the current implementation, the following round-tripping will fail with the current version of omero-py plugin but and the current implementation will fix it:

omero import test.fits
omero download Image:1 /tmp/foo
omero import /tmp/foo

Happy to see this released as a minor version increment under these terms.

@joshmoore
Copy link
Member

I assume this feature respects the policy restrictions set up server-side?

It has no choice 😉

@will-moore
Copy link
Member Author

@sbesson There are tests for the download policy which are testing the Image download - see #298 (comment)

Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging and releasing given no objection has been raised to the release proposal above.

@sbesson sbesson merged commit 89fa8b2 into ome:master Sep 1, 2021
@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/download-full-projects-with-their-datasets/56434/16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants