-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting stuck with bunch of batched git-annex processes running #56
Comments
@yarikoptic First, for the record, the "Editing repository" message means that a GitHub repository is being edited/modified, usually to update the sizes in the repository description, which is the last step in backing up an individual Dandiset. Because multiple Dandisets are backed up concurrently, it's not that unusual for the "Editing" messages to be out of order with respect to Dandiset IDs. As to the actual problem, when I look at the log messages for Dandiset 000874, the last message is "Done feeding URLs to addurl", but there is no corresponding "Done reading from addurl". In addition, the logs contain multiple messages about starting the download of various JSON files but no corresponding messages about finishing the downloads. I tried downloading one of the listed URLs manually with:
(where the value of If this 403 had occurred when backups2datalad was downloading the file, then there should have been a message logged about it, unless #!/bin/bash
export DATALAD_dandi_token=---REDACTED---
set -ex -o pipefail
git init foo
cd foo
git annex init
mkdir -p .datalad/providers
cat > .datalad/providers/dandi.cfg <<'EOT'
[provider:dandi]
url_re = https?://api\.dandiarchive\.org/api/.*
authentication_type = http_token
credential = dandi
[credential:dandi]
type = token
EOT
git add .datalad/providers/dandi.cfg
git commit -m 'Add dandi provider config'
git annex initremote datalad \
type=external \
externaltype=datalad \
encryption=none \
autoenable=true \
uuid=cf13d535-b47c-5df6-8590-0793cb08a90a
printf '%s %s\n' \
https://api.dandiarchive.org/api/assets/3e98c412-b4be-4e3d-8709-662e721cba30/download/ \
derivatives/OCT-pipeline/sub-SP002/micr/sub-SP002_ses-OCT_sample-01_res-20um_OCT.json \
| git annex addurl --batch --with-files --json --json-error-messages --json-progress --raw-except=datalad which output the following at the
Based on the traceback I got when I hit Ctrl-C, it seems this last prompt came from DataLad rather than git-annex. This even happened when I redirected stdout to a pipe, which seems like bad behavior from DataLad. In addition, the stalled |
oh, thank you very much for digging it up -- I should have noticed trailing datalad process! I thought that it was my outdated token, but didn't check explicitly and changed to a new service admin user's token... but surprise -- even though I could curl endpoint for an embargoed dandiset, I still got this 403 for the assets Let's see if may be there would be a quick fix... and I should look into preventing interactivity from datalad... |
@yarikoptic So should the |
yes, please kill, upgrade datalad in that env to at least 1.0.3 which had relevant fix
so we should not get it stuck again (may be first check on your script if that is indeed the case). |
@yarikoptic All |
Thank you! Let's consider this issue addressed! |
I think we had similar case some time recently which was "resolved" by killing those few git-annex processes. But looking at the process tree:
It looks to me like the process simply did not explicitly exit/kill those batched git-annex processes. Is that a right assumption or they were supposed to exit since their pipe file descriptors were closed? (I think we had some issues like that in git-annex before). WDYT @jwodder ?
The full log is at
/mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2024.07.26.20.02.11Z.log
FWIW -- it seems like we did go through all dandisets, although the last one "Editing" was out of order:
and note that some were "edited" multiple times (not sure what "Editing" means here exactly but if any kind of change + commit -- could as well happen).
Looking at one of those sample stuck batched processes:
we see that pipe was opened but never "Waiting for" to be closed:
like it did for other dandisets:
suggesting that something lead our script to get stuck (could be some other git-annex process of cause)... so I looked at all of them -- seems all are for the same dandiset and none of them "Waiting for":
dandiset itself is dirty:
@jwodder could you try to figure out what could the script potentially waiting on for?
I have not interrupted any process ATM so we could troubleshoot but let's prioritize since that delays update of dandisets.
The text was updated successfully, but these errors were encountered: