Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--mode verify detects lots of unexpected diffs in metadata without server timestamp boost #49

Open
yarikoptic opened this issue Jun 7, 2024 · 6 comments
Assignees

Comments

@yarikoptic
Copy link
Member

After

I manually ran the --mode verify sweep and it errorred out quite loudly -- here is the trail pointing to the full log

    +---------------- 15 ----------------
    | Traceback (most recent call last):
    |   File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/backups2datalad/asyncer.py", line 257, in process_blob
    |     raise UnexpectedChangeError(
    | backups2datalad.util.UnexpectedChangeError: Dandiset 000966: Metadata for asset sub-M230804-1/sub-M230804-1_ses-20231229T155815_ecephys.nwb was changed/added but draft timestamp was not updated on server:
    |
    | Metadata diff:
    |
    | --- old-metadata
    | +++ new-metadata
    | @@ -13,7 +13,7 @@
    |    contentSize: 247230064
    |    contentUrl:
    |    - https://api.dandiarchive.org/api/assets/acf6172c-d85f-4a22-ae19-6ba011a53e31/download/
    | -  - https://dandiarchive-embargo.s3.amazonaws.com/000966/blobs/b53/94e/b5394ed4-e80f-4fdf-bbc8-5d82717cf42a
    | +  - https://dandiarchive.s3.amazonaws.com/blobs/b53/94e/b5394ed4-e80f-4fdf-bbc8-5d82717cf42a
    |    dateModified: '2024-04-21T18:07:38.991543-04:00'
    |    digest:
    |      dandi:dandi-etag: 8fa0a66dc8ae41e2f124bf036cfc6594-4
    | @@ -77,6 +77,6 @@
    |        schemaKey: Software
    |        url: https://github.com/dandi/dandi-cli
    |        version: 0.61.2
    | -modified: '2024-04-21T22:07:46.774560Z'
    | +modified: '2024-04-29T19:35:16.321230Z'
    |  path: sub-M230804-1/sub-M230804-1_ses-20231229T155815_ecephys.nwb
    |  size: 247230064
    |
    |
    +---------------- ... ----------------
    | and 3 more exceptions
    +------------------------------------
2024-06-06T20:28:53-0400 [ERROR   ] backups2datalad: An error occurred:
Traceback (most recent call last):
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/backups2datalad/__main__.py", line 119, in wrapped
    await f(datasetter, *args, **kwargs)
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/backups2datalad/__main__.py", line 228, in update_from_backup
    await datasetter.update_from_backup(dandisets, exclude=exclude)
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/backups2datalad/datasetter.py", line 94, in update_from_backup
    raise RuntimeError(
RuntimeError: Backups for 162 Dandisets failed
Logs saved to /mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2024.06.07.00.22.30Z.log
action summary:
  publish (notneeded: 2)

from which it looks like potentially unemabrgoing forgetting to reset the modified may be?

@jwodder
Copy link
Member

jwodder commented Jun 7, 2024

@yarikoptic The error message seems pretty clear to me:

backups2datalad.util.UnexpectedChangeError: Dandiset 000966: Metadata for asset sub-M230804-1/sub-M230804-1_ses-20231229T155815_ecephys.nwb was changed/added but draft timestamp was not updated on server

This is the Archive's fault for not updating the Dandiset's draft version's modified timestamp upon unembargoing. Running the backup command with --mode force should get rid of the error.

@yarikoptic
Copy link
Member Author

But it is RuntimeError: Backups for 162 Dandisets failed -- is there already so many dandisets which were unembargoed??? (very unlikely)

@jwodder
Copy link
Member

jwodder commented Jul 26, 2024

@yarikoptic Based on the below script, there are only 6 Dandisets that have been unembargoed (000253, 000408, 000773, 000774, 000897, and 000935).

Is the problem described in the original comment still an issue?

#!/bin/bash
set -eu -o pipefail

dandiset_root=/mnt/backup/dandi/dandisets

cd "$dandiset_root"
for ds in 0*
do
    embargo_status="$(git -C "$ds" config --file .datalad/config --default OPEN --get dandi.dandiset.embargo-status)"
    if [ "$embargo_status" = OPEN ] \
        && git -C "$ds" log -S EMBARGOED -n1 -- .datalad/config | grep -q .
    then echo "$ds"
    fi
done

@jwodder
Copy link
Member

jwodder commented Aug 6, 2024

@yarikoptic Ping.

@yarikoptic
Copy link
Member Author

blocked by #56 ATM. Please just rerun that command with --verify whenever we do not have ongoing backup process running

@jwodder
Copy link
Member

jwodder commented Aug 12, 2024

@yarikoptic This problem is still occurring, but seeing as it's affecting Dandisets that are still embargoed, the problem seems to be solely with Dandi Archive. I have filed dandi/dandi-archive#2002.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants