Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asset metadata not appearing on DLP #1734

Closed
bendichter opened this issue Nov 3, 2023 · 22 comments · Fixed by #1803
Closed

asset metadata not appearing on DLP #1734

bendichter opened this issue Nov 3, 2023 · 22 comments · Fixed by #1803
Assignees

Comments

@bendichter
Copy link
Member

https://dandiarchive.org/dandiset/000691/draft

image

Does anyone know why asset metadata might not be appearing here?

@bendichter
Copy link
Member Author

cc @weiglszonja

@satra
Copy link
Member

satra commented Nov 3, 2023

i suspect the assets summary generator is failing, but the archive is ignoring that failure.

on the archive side, two things should happen:

  • publish should be disabled as a result of the failure
  • a validation report should contain the failure

could one of you try generating the asset summary directly in python using the dandischema function (i believe it's in the metadata module)? it takes a list of asset metadata as input.

@bendichter
Copy link
Member Author

@weiglszonja, could you give this a shot?

@weiglszonja
Copy link

weiglszonja commented Nov 7, 2023

@satra can you tell me how to generate the asset metadata? I'm not sure I found what you were referring to..

from dandi.metadata import nwb2asset

asset_md = nwb2asset("000691/sub-4mm-mouse-1/sub-4mm-mouse-1_image+ophys.nwb")

This returns:

{
    "id": None,
    "schemaKey": "Asset",
    "schemaVersion": "0.6.4",
    "name": None,
    "description": None,
    "contributor": None,
    "about": None,
    "studyTarget": None,
    "license": None,
    "protocol": None,
    "ethicsApproval": None,
    "keywords": None,
    "acknowledgement": None,
    "access": [
        {
            "id": None,
            "schemaKey": "AccessRequirements",
            "status": "dandi:OpenAccess",
            "contactPoint": None,
            "description": None,
            "embargoedUntil": None
        }
    ],
    "url": None,
    "repository": None,
    "relatedResource": None,
    "wasGeneratedBy": [
        {
            "id": None,
            "schemaKey": "Session",
            "identifier": "2021-07-26T13-50-50",
            "name": "2021-07-26T13-50-50",
            "description": "This session includes calcium imaging recorded from a head-mounted microscope in a freely moving mouse while simultaneously recording more than a thousand neurons in cortex.",
            "startDate": "2021-07-26T13-50-50",
            "endDate": None,
            "wasAssociatedWith": None,
            "used": None
        },
        {
            "id": "urn:uuid:21285aea-6c45-4d5e-a05b-5c8f9a541045",
            "schemaKey": "Activity",
            "identifier": None,
            "name": "Metadata generation",
            "description": "Metadata generated by DANDI cli",
            "startDate": "2023-11-07T21:46:06.850514+0100",
            "endDate": "2023-11-07T21:46:36.501100+0100",
            "wasAssociatedWith": [
                {
                    "id": None,
                    "schemaKey": "Software",
                    "identifier": "RRID:SCR_019009",
                    "name": "DANDI Command Line Interface",
                    "version": "0.55.1",
                    "url": "https://github.com/dandi/dandi-cli"
                }
            ],
            "used": None
        }
    ],
    "contentSize": 15232760735,
    "encodingFormat": "application/x-nwb",
    "digest": {},
    "path": "/Volumes/t7-ssd/fee-lab-to-nwb/ophys/final/000691/sub-4mm-mouse-1/sub-4mm-mouse-1_image+ophys.nwb",
    "dateModified": "2023-11-07T21:46:36.501290+0100",
    "blobDateModified": "2023-11-02T17:54:28+0100",
    "dataType": None,
    "sameAs": None,
    "approach": [
        {
            "id": None,
            "schemaKey": "ApproachType",
            "identifier": None,
            "name": "microscopy approach; cell population imaging"
        }
    ],
    "measurementTechnique": [
        {
            "id": None,
            "schemaKey": "MeasurementTechniqueType",
            "identifier": None,
            "name": "surgical technique"
        },
        {
            "id": None,
            "schemaKey": "MeasurementTechniqueType",
            "identifier": None,
            "name": "two-photon microscopy technique"
        },
        {
            "id": None,
            "schemaKey": "MeasurementTechniqueType",
            "identifier": None,
            "name": "analytical technique"
        }
    ],
    "variableMeasured": [
        {
            "id": None,
            "schemaKey": "PropertyValue",
            "maxValue": None,
            "minValue": None,
            "unitText": None,
            "value": "TwoPhotonSeries",
            "valueReference": None,
            "propertyID": None
        },
        {
            "id": None,
            "schemaKey": "PropertyValue",
            "maxValue": None,
            "minValue": None,
            "unitText": None,
            "value": "ImagingPlane",
            "valueReference": None,
            "propertyID": None
        },
        {
            "id": None,
            "schemaKey": "PropertyValue",
            "maxValue": None,
            "minValue": None,
            "unitText": None,
            "value": "ProcessingModule",
            "valueReference": None,
            "propertyID": None
        },
        {
            "id": None,
            "schemaKey": "PropertyValue",
            "maxValue": None,
            "minValue": None,
            "unitText": None,
            "value": "PlaneSegmentation",
            "valueReference": None,
            "propertyID": None
        },
        {
            "id": None,
            "schemaKey": "PropertyValue",
            "maxValue": None,
            "minValue": None,
            "unitText": None,
            "value": "OpticalChannel",
            "valueReference": None,
            "propertyID": None
        }
    ],
    "wasDerivedFrom": None,
    "wasAttributedTo": [
        {
            "id": None,
            "schemaKey": "Participant",
            "identifier": "4mm-mouse-1",
            "altName": None,
            "strain": None,
            "cellLine": None,
            "vendor": None,
            "age": {
                "id": None,
                "schemaKey": "PropertyValue",
                "maxValue": None,
                "minValue": None,
                "unitText": "ISO-8601 duration",
                "value": "P3M",
                "valueReference": {
                    "id": None,
                    "schemaKey": "PropertyValue",
                    "maxValue": None,
                    "minValue": None,
                    "unitText": None,
                    "value": "dandi:BirthReference",
                    "valueReference": None,
                    "propertyID": None
                },
                "propertyID": None
            },
            "sex": {
                "id": None,
                "schemaKey": "SexType",
                "identifier": "http://purl.obolibrary.org/obo/PATO_0000384",
                "name": "Male"
            },
            "genotype": "C57/B6",
            "species": {
                "id": None,
                "schemaKey": "SpeciesType",
                "identifier": "http://purl.obolibrary.org/obo/NCBITaxon_10090",
                "name": "Mus musculus - House mouse"
            },
            "disorder": None,
            "relatedParticipant": None,
            "sameAs": None
        }
    ]
}

@satra
Copy link
Member

satra commented Nov 7, 2023

@weiglszonja - here you go:

from dandi.dandiapi import DandiAPIClient
from dandischema.metadata import aggregate_assets_summary

api =  DandiAPIClient()
ds = api.get_dandiset("000691")
aggregate_assets_summary(ds.get_assets())

it seems that the avi file metadata is missing schemaVersion

@weiglszonja
Copy link

Thank you @satra, I see the error now. Do you have any suggestion how to fix it and why this happened?

@satra
Copy link
Member

satra commented Nov 8, 2023

@AlmightyYakob - do you know how a metadata field without a schemaVersion get's stored without raising an invalid asset message?

@satra
Copy link
Member

satra commented Nov 8, 2023

@weiglszonja - was the avi file uploaded by skipping around validation? and if not, @yarikoptic, is there a reason why you think the asset did not get schemaVersion? that field is baked into the models.

@AlmightyYakob - can we check on the database side how many assets in which dandisets are missing schemaVersion and when that asset was uploaded?

@weiglszonja
Copy link

@satra I followed these steps in the terminal, I used --update-external-file-paths maybe that is the culprit here?

cd 000691/
dandi organize -f move --update-external-file-paths
dandi validate .
dandi upload

@satra
Copy link
Member

satra commented Nov 8, 2023

thanks @weiglszonja - that by itself should not result in a lack of schemaVersion since the schema is posted during the upload step and should (in theory) be the same for any asset uploaded.

@yarikoptic
Copy link
Member

I do not see any .avi among assets and all of those .nwb I see have schemaVersion
❯ curl --silent -X 'GET' 'https://api.dandiarchive.org/api/dandisets/000631/versions/draft/assets/?metadata=true' -H 'accept: application/json' | jq '.results | .[] | { path: .path, schemaVersion: .metadata.schemaVersion}'
{
  "path": "sub-600ns-4kV-0,8MHz-BP-8-24-21-BPAE-20/sub-600ns-4kV-0,8MHz-BP-8-24-21-BPAE-20_ses-600ns-4kV-0-8MHz paired pulse trains_image.nwb",
  "schemaVersion": "0.6.4"
}
{
  "path": "sub-600ns-4kV-0,8MHz-BP-8-24-21-BPAE-13/sub-600ns-4kV-0,8MHz-BP-8-24-21-BPAE-13_ses-BPAE-5xBipolar-0,83MHz_image.nwb",
  "schemaVersion": "0.6.4"
}
{
  "path": "sub-600ns-4kV-0,8MHz-BP-8-24-21-BPAE-15/sub-600ns-4kV-0,8MHz-BP-8-24-21-BPAE-15_ses-600ns-4kV-0-8MHz paired pulse trains_image.nwb",
  "schemaVersion": "0.6.4"
}
{
  "path": "sub-600ns-4kV-0,8MHz-UP-10-5-21-BPAE-06/sub-600ns-4kV-0,8MHz-UP-10-5-21-BPAE-06_ses-600ns-4kV-0-8MHz single pulse trains_image.nwb",
  "schemaVersion": "0.6.4"
}
{
  "path": "sub-600ns-4kV-0,8MHz-UP-10-5-21-BPAE-05/sub-600ns-4kV-0,8MHz-UP-10-5-21-BPAE-05_ses-600ns-4kV-0-8MHz single pulse trains_image.nwb",
  "schemaVersion": "0.6.4"
}
{
  "path": "sub-600ns-5kV-1HzBP-8-9-21-BPAE-13/sub-600ns-5kV-1HzBP-8-9-21-BPAE-13_ses-600ns-5kV-1Hz paired pulse trains_image.nwb",
  "schemaVersion": "0.6.4"
}
{
  "path": "sub-600ns-5kV-1HzBP-8-9-21-BPAE-15/sub-600ns-5kV-1HzBP-8-9-21-BPAE-15_ses-600ns-5kV-1Hz paired pulse trains_image.nwb",
  "schemaVersion": "0.6.4"
}
{
  "path": "sub-600ns-4kV-0,8MHz-UP-10-5-21-BPAE-10/sub-600ns-4kV-0,8MHz-UP-10-5-21-BPAE-10_ses-600ns-4kV-0-8MHz single pulse trains_image.nwb",
  "schemaVersion": "0.6.4"
}
{
  "path": "sub-600ns-5kV-1HzUP-7-31-21-BPAE-03/sub-600ns-5kV-1HzUP-7-31-21-BPAE-03_ses-600ns-5kV-1Hz single pulse trains_image.nwb",
  "schemaVersion": "0.6.4"
}
{
  "path": "sub-600ns-5kV-1HzBP-8-9-21-BPAE-8/sub-600ns-5kV-1HzBP-8-9-21-BPAE-8_ses-600ns-5kV-1Hz paired pulse trains_image.nwb",
  "schemaVersion": "0.6.4"
}
{
  "path": "sub-600ns-5kV-1HzUP-7-31-21-BPAE-10/sub-600ns-5kV-1HzUP-7-31-21-BPAE-10_ses-600ns-5kV-1Hz single pulse trains_image.nwb",
  "schemaVersion": "0.6.4"
}
{
  "path": "sub-600ns-5kV-1HzUP-7-31-21-BPAE-18/sub-600ns-5kV-1HzUP-7-31-21-BPAE-18_ses-600ns-5kV-1Hz single pulse trains_image.nwb",
  "schemaVersion": "0.6.4"
}
{
  "path": "sub-600ns-5kV-1HzUP-7-31-21-BPAE-5/sub-600ns-5kV-1HzUP-7-31-21-BPAE-5_ses-600ns-5kV-1Hz single pulse trains_image.nwb",
  "schemaVersion": "0.6.4"
}
{
  "path": "sub-600ns-5kV-1HzUP-8-9-21-BPAE-14/sub-600ns-5kV-1HzUP-8-9-21-BPAE-14_ses-600ns-5kV-1Hz single pulse trains_image.nwb",
  "schemaVersion": "0.6.4"
}
{
  "path": "sub-600ns-5kV-1HzUP-8-9-21-BPAE-10/sub-600ns-5kV-1HzUP-8-9-21-BPAE-10_ses-600ns-5kV-1Hz single pulse trains_image.nwb",
  "schemaVersion": "0.6.4"
}

@satra
Copy link
Member

satra commented Nov 8, 2023

@yarikoptic - you are looking at 631 not 691.

also in dandiset 631 that you looked at, there are , in the names - how did those pass validation?

@yarikoptic
Copy link
Member

also in dandiset 631 that you looked at, there are , in the names

I noted spaces... for both spaces and , -- submitted dandi/dandi-cli#1351

  • how did those pass validation?

I don't know ;)

@yarikoptic
Copy link
Member

@yarikoptic - you are looking at 631 not 691.

❯ curl --silent -X 'GET' 'https://api.dandiarchive.org/api/dandisets/000691/versions/draft/assets/?metadata=true' -H 'accept: application/json' | jq '.results | .[] | { path: .path, schemaVersion: .metadata.schemaVersion}'
{
  "path": "sub-4mm-mouse-1/sub-4mm-mouse-1_image+ophys/60d277d6-24fd-4817-bb02-41382db59172_external_file_0.avi",
  "schemaVersion": "0.6.4"
}
{
  "path": "sub-4mm-mouse-1/sub-4mm-mouse-1_image+ophys.nwb",
  "schemaVersion": "0.6.4"
}

so what are we missing?

@jjnesbitt
Copy link
Member

@AlmightyYakob - do you know how a metadata field without a schemaVersion get's stored without raising an invalid asset message?

It shouldn't be possible to store an asset with a missing schemaVersion field, as there's a postgres check constraint that will prevent record insertion if that field is missing.

@AlmightyYakob - can we check on the database side how many assets in which dandisets are missing schemaVersion and when that asset was uploaded?

Indeed, querying for assets that have a missing schemaVersion metadata field returns no results.

@jjnesbitt
Copy link
Member

There's been a lot of discussion here around schemaVersion and other things, which don't seem related to the core issue.

@bendichter It appears the primary reason that the assets summary is missing on 000691 is because the assets summary aggregation was run prior to whatever recent data was uploaded. There is an assetsSummary field on that dandiset with some data, but not much, and re-running that aggregation produces a more expected result. We can investigate the core issue, and fix it generally. @bendichter If it's high priority for the asset summary to be populated on this dandiset, I can manually run that aggregation so it shows up.

@satra
Copy link
Member

satra commented Nov 8, 2023

the discussion was based on a returned error here #1734 (comment) , but that code is incorrect because that function doesn't accept an iterable of dandi RemoteAssets.

instead it should have been:

aggregate_assets_summary([asset.get_raw_metadata() for asset in ds.get_assets()])

indeed, once this is done the schema version error doesn't show up.

@satra
Copy link
Member

satra commented Nov 8, 2023

@AlmightyYakob - it also doesn't show up on dandiset 000026.

@bendichter bendichter changed the title asset metadata not appearing on DLC asset metadata not appearing on DLP Nov 9, 2023
@sheriferson
Copy link

I came looking for reports on the same issue which I'm facing with two datasets currently in draft status:

Both uploaded using the cli using the same commands @weiglszonja reported:

dandi organize -f move --update-external-file-paths
dandi validate .
dandi upload

A very similar dataset we validated and uploaded earlier this year shows Asset Summary information with no issues:

Any suggestions for how to proceed? Can we publish the datasets and have the assset summary generated afterwards?

@yarikoptic
Copy link
Member

We can investigate the core issue, and fix it generally. @bendichter If it's high priority for the asset summary to be populated on this dandiset, I can manually run that aggregation so it shows up.

@jjnesbitt could you please try running reaggregation on originally reported 000691 and then 000768 and 000769 while keeping an eye on either any errors get triggered?

@jjnesbitt
Copy link
Member

@jjnesbitt could you please try running reaggregation on originally reported 000691 and then 000768 and 000769 while keeping an eye on either any errors get triggered?

I've re-run the aggregation and all three of these dandisets now show a proper assets summary.

@sheriferson
Copy link

@jjnesbitt and @yarikoptic Thank you for helping out so quickly, it's much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants