Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple features on file search and download #243

Closed
jjkoehorst opened this issue Aug 15, 2024 · 15 comments
Closed

Multiple features on file search and download #243

jjkoehorst opened this issue Aug 15, 2024 · 15 comments
Labels
discussion documentation Improvements or additions to documentation

Comments

@jjkoehorst
Copy link

The application works great and we try to incorporate it in Jupyter notebooks however there are some small things that we would like to see added if possible.

  1. The search function when searching for a file we have to add a %collection%/%.txt / so ibridges knows I am searching for a file now right? Is it possible to have a collection / data variable instead of only path?

  2. Skip if exists, currently it throws an error if a file already exists ... (we know please ignore and skip? or a file size check?) ignore error is a bit overkill as this is not really an error?

  3. The folder does not exists where you want to download the file to... can ibridges please create this? (param mkdir=true?)

@jjkoehorst
Copy link
Author

jjkoehorst commented Aug 15, 2024

This is currently my procedure to retrieve the data

for i, d in enumerate(data):
    irods_path = d['COLL_NAME'] + "/" + d['DATA_NAME']
    local_path = "./" + irods_path
    # Create the folder structure
    os.makedirs(os.path.dirname(local_path), exist_ok=True)
    print("Downloading: " + str(i) + " of " + str(len(data)) + " " + local_path, end="\r")
    if not os.path.exists(local_path):
        download(session, irods_path=irods_path, local_path=local_path)

@qubixes
Copy link
Collaborator

qubixes commented Aug 15, 2024

Hi @jjkoehorst

Thanks for the feedback!

  1. We are currently in the process of completely reworking the search Make the iBridges search more easy to use #239. The PR is basically finished, so if you want to have a look and see if that provides what you want that would be great.
  2. You can use overwrite==True. If the checksums are the same, the data won't actually be transferred, but checksums do need to be computed (locally). Is that solution good enough for you?
  3. Hmmm, I/we would need to think about that. I can understand it might be a nice convenience. On the other hand it would introduce one more argument to the download/upload/sync function, where it only saves a single line of code. You can always wrap this functionality in your own function perhaps?

@jjkoehorst
Copy link
Author

jjkoehorst commented Aug 16, 2024

  1. I was curious why it keeps overwriting the file so by adding a ton of prints to the download function I noticed

Conflicting checksums for unlock/home/wur.fdp/stu_hiv-composition-gut/obs_o_samd00665758/sam_samd00665758/metagenomic_amplicon_illumina/asy_drr519722/NGTAX_Silva138.1-picrust_100_DRR519722/provenance.ttl.68bb1fbf513a3703bfcdc03e692a79fc sha2:DS9/8OmoTZoqYgNvinWluzwgio06KRB/xmycWV1zqVk=

That I think you calculate the sha2? and irods is calculating the md5?

I don't "own" / manage the irods instance so I cannot easily change the checksum settings on irods...

@chStaiger
Copy link
Member

MD5 will be taken into account: #248

For more verbosity on uploads, downloads and synchronisations you can use the dry-run option:
Here an example for uploads

from ibridges.path import IrodsPath
from pathlib import Path
from ibridges.interactive import interactive_auth
from ibridges import upload

local_path = Path.home().joinpath("Downloads", "my_books")
irods_path = IrodsPath(session, "demo")

ops = upload(session, local_path, irods_path, dry_run=True, overwrite=True)
ops.print_summary()

You can then either execute all operations with ops.execute(session) or you can execute the single parts.
https://ibridges.readthedocs.io/en/stable/api/full_reference.html#module-ibridges.executor

@chStaiger chStaiger added documentation Improvements or additions to documentation discussion labels Aug 16, 2024
@chStaiger
Copy link
Member

ad 3. I am a bit hesitant to introduce that to our data transfers. It is convenient if you just want to create the direct folder. However, to make it generic we would have to check if someone wants to add a full collection/folder subtree and that should not happen automatically but done consciously by the programmer.

@jjkoehorst
Copy link
Author

True, @chStaiger makes sense and its just one line of code extra.

Another topic regarding downloads... We often need to download a few thousand files. In a Jupyter notebook you often start all the steps and then it needs to do checksum compute again right? Is it possible to do a size check only?

@qubixes
Copy link
Collaborator

qubixes commented Aug 21, 2024

@jjkoehorst There are no size checks available. What you can do with the new PR #254 is skip the file/data object if it exists. Perhaps for now that suffices for you?

@bartns
Copy link

bartns commented Aug 22, 2024

That could lead to thousands of warnings which migth also not be very convenient... Could add a "ignore_warning" or "silent" mode ?

@chStaiger
Copy link
Member

In the next release we just opted for the warnings:

overwrite==False, ignore_err=False -> FileExistError
overwrite==True, ignore_err=False -> checksum -> copy
overwrite==False, ignore_err=True -> Skip and warn
overwrite==True, ignore_err=True -> checksum -> copy

Mainly because our current main user group are less experienced users and not informing them in any way they might get the wrong impression of the state of the data.
But we have it on our radar and will see what we can do in later releases.

@bartns
Copy link

bartns commented Aug 22, 2024

I understand and agree :) My suggestion was not to change it but to add something to be able to silence it.

@chStaiger
Copy link
Member

I will release the next version without that feature and we will discuss internally whether we want to do a quickfix for that in a bug-fix version or directly work on a more thorough solution #255

@chStaiger
Copy link
Member

WIth that, can we close this issue, or did I overlook something that needs to go into another ticket to pick it up in the development?

@qubixes
Copy link
Collaborator

qubixes commented Aug 22, 2024

I understand and agree :) My suggestion was not to change it but to add something to be able to silence it.

You can suppress it using the warnings library, see for example:

https://stackoverflow.com/questions/14463277/how-to-disable-python-warnings

@chStaiger
Copy link
Member

@qubixes good point! I will remove the issue.

@qubixes
Copy link
Collaborator

qubixes commented Aug 23, 2024

I think the feature requests have been addressed, so I'm closing this issue. If there is a new feature request that is related to the ones in this issue, just open a new issue (one per feature ideally).

@qubixes qubixes closed this as completed Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

4 participants