Multiple features on file search and download #243

jjkoehorst · 2024-08-15T14:05:34Z

The application works great and we try to incorporate it in Jupyter notebooks however there are some small things that we would like to see added if possible.

The search function when searching for a file we have to add a %collection%/%.txt / so ibridges knows I am searching for a file now right? Is it possible to have a collection / data variable instead of only path?
Skip if exists, currently it throws an error if a file already exists ... (we know please ignore and skip? or a file size check?) ignore error is a bit overkill as this is not really an error?
The folder does not exists where you want to download the file to... can ibridges please create this? (param mkdir=true?)

The text was updated successfully, but these errors were encountered:

jjkoehorst · 2024-08-15T14:10:35Z

This is currently my procedure to retrieve the data

for i, d in enumerate(data):
    irods_path = d['COLL_NAME'] + "/" + d['DATA_NAME']
    local_path = "./" + irods_path
    # Create the folder structure
    os.makedirs(os.path.dirname(local_path), exist_ok=True)
    print("Downloading: " + str(i) + " of " + str(len(data)) + " " + local_path, end="\r")
    if not os.path.exists(local_path):
        download(session, irods_path=irods_path, local_path=local_path)

qubixes · 2024-08-15T16:35:53Z

Hi @jjkoehorst

Thanks for the feedback!

We are currently in the process of completely reworking the search Make the iBridges search more easy to use #239. The PR is basically finished, so if you want to have a look and see if that provides what you want that would be great.
You can use overwrite==True. If the checksums are the same, the data won't actually be transferred, but checksums do need to be computed (locally). Is that solution good enough for you?
Hmmm, I/we would need to think about that. I can understand it might be a nice convenience. On the other hand it would introduce one more argument to the download/upload/sync function, where it only saves a single line of code. You can always wrap this functionality in your own function perhaps?

jjkoehorst · 2024-08-16T04:51:39Z

I was curious why it keeps overwriting the file so by adding a ton of prints to the download function I noticed

Conflicting checksums for unlock/home/wur.fdp/stu_hiv-composition-gut/obs_o_samd00665758/sam_samd00665758/metagenomic_amplicon_illumina/asy_drr519722/NGTAX_Silva138.1-picrust_100_DRR519722/provenance.ttl.68bb1fbf513a3703bfcdc03e692a79fc sha2:DS9/8OmoTZoqYgNvinWluzwgio06KRB/xmycWV1zqVk=

That I think you calculate the sha2? and irods is calculating the md5?

I don't "own" / manage the irods instance so I cannot easily change the checksum settings on irods...

chStaiger · 2024-08-16T12:17:59Z

MD5 will be taken into account: #248

For more verbosity on uploads, downloads and synchronisations you can use the dry-run option:
Here an example for uploads

from ibridges.path import IrodsPath
from pathlib import Path
from ibridges.interactive import interactive_auth
from ibridges import upload

local_path = Path.home().joinpath("Downloads", "my_books")
irods_path = IrodsPath(session, "demo")

ops = upload(session, local_path, irods_path, dry_run=True, overwrite=True)
ops.print_summary()

You can then either execute all operations with ops.execute(session) or you can execute the single parts.
https://ibridges.readthedocs.io/en/stable/api/full_reference.html#module-ibridges.executor

chStaiger · 2024-08-16T14:36:00Z

ad 3. I am a bit hesitant to introduce that to our data transfers. It is convenient if you just want to create the direct folder. However, to make it generic we would have to check if someone wants to add a full collection/folder subtree and that should not happen automatically but done consciously by the programmer.

jjkoehorst · 2024-08-19T14:10:46Z

True, @chStaiger makes sense and its just one line of code extra.

Another topic regarding downloads... We often need to download a few thousand files. In a Jupyter notebook you often start all the steps and then it needs to do checksum compute again right? Is it possible to do a size check only?

qubixes · 2024-08-21T12:36:07Z

@jjkoehorst There are no size checks available. What you can do with the new PR #254 is skip the file/data object if it exists. Perhaps for now that suffices for you?

bartns · 2024-08-22T15:26:26Z

That could lead to thousands of warnings which migth also not be very convenient... Could add a "ignore_warning" or "silent" mode ?

chStaiger · 2024-08-22T15:45:39Z

In the next release we just opted for the warnings:

overwrite==False, ignore_err=False -> FileExistError
overwrite==True, ignore_err=False -> checksum -> copy
overwrite==False, ignore_err=True -> Skip and warn
overwrite==True, ignore_err=True -> checksum -> copy

Mainly because our current main user group are less experienced users and not informing them in any way they might get the wrong impression of the state of the data.
But we have it on our radar and will see what we can do in later releases.

bartns · 2024-08-22T15:49:26Z

I understand and agree :) My suggestion was not to change it but to add something to be able to silence it.

chStaiger · 2024-08-22T17:05:25Z

I will release the next version without that feature and we will discuss internally whether we want to do a quickfix for that in a bug-fix version or directly work on a more thorough solution #255

chStaiger · 2024-08-22T17:07:47Z

WIth that, can we close this issue, or did I overlook something that needs to go into another ticket to pick it up in the development?

qubixes · 2024-08-22T17:12:36Z

I understand and agree :) My suggestion was not to change it but to add something to be able to silence it.

You can suppress it using the warnings library, see for example:

https://stackoverflow.com/questions/14463277/how-to-disable-python-warnings

chStaiger · 2024-08-23T06:21:50Z

@qubixes good point! I will remove the issue.

qubixes · 2024-08-23T08:24:05Z

I think the feature requests have been addressed, so I'm closing this issue. If there is a new feature request that is related to the ones in this issue, just open a new issue (one per feature ideally).

jjkoehorst mentioned this issue Aug 15, 2024

Make the iBridges search more easy to use #239

Merged

4 tasks

chStaiger added documentation Improvements or additions to documentation discussion labels Aug 16, 2024

qubixes closed this as completed Aug 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple features on file search and download #243

Multiple features on file search and download #243

jjkoehorst commented Aug 15, 2024

jjkoehorst commented Aug 15, 2024 •

edited

Loading

qubixes commented Aug 15, 2024

jjkoehorst commented Aug 16, 2024 •

edited

Loading

chStaiger commented Aug 16, 2024

chStaiger commented Aug 16, 2024

jjkoehorst commented Aug 19, 2024

qubixes commented Aug 21, 2024

bartns commented Aug 22, 2024

chStaiger commented Aug 22, 2024

bartns commented Aug 22, 2024

chStaiger commented Aug 22, 2024

chStaiger commented Aug 22, 2024

qubixes commented Aug 22, 2024 •

edited

Loading

chStaiger commented Aug 23, 2024

qubixes commented Aug 23, 2024

Multiple features on file search and download #243

Multiple features on file search and download #243

Comments

jjkoehorst commented Aug 15, 2024

jjkoehorst commented Aug 15, 2024 • edited Loading

qubixes commented Aug 15, 2024

jjkoehorst commented Aug 16, 2024 • edited Loading

chStaiger commented Aug 16, 2024

chStaiger commented Aug 16, 2024

jjkoehorst commented Aug 19, 2024

qubixes commented Aug 21, 2024

bartns commented Aug 22, 2024

chStaiger commented Aug 22, 2024

bartns commented Aug 22, 2024

chStaiger commented Aug 22, 2024

chStaiger commented Aug 22, 2024

qubixes commented Aug 22, 2024 • edited Loading

chStaiger commented Aug 23, 2024

qubixes commented Aug 23, 2024

jjkoehorst commented Aug 15, 2024 •

edited

Loading

jjkoehorst commented Aug 16, 2024 •

edited

Loading

qubixes commented Aug 22, 2024 •

edited

Loading