Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] .get_metadata() from embargoed dandiset #1363

Closed
CodyCBakerPhD opened this issue Nov 21, 2023 · 13 comments · Fixed by #1398
Closed

[Bug] .get_metadata() from embargoed dandiset #1363

CodyCBakerPhD opened this issue Nov 21, 2023 · 13 comments · Fixed by #1398
Labels

Comments

@CodyCBakerPhD
Copy link
Contributor

CodyCBakerPhD commented Nov 21, 2023

from dandi.dandiapi import DandiAPIClient

client = DandiAPIClient(token="my token")  # need token for embargo

dandiset_620 = client.get_dandiset(dandiset_id="000620")  # embargoed

dandiset_620_metadata = dandiset_620.get_metadata()

> ValidationError: 1 validation error for Dandiset
> access -> 0 -> __root__
>   An embargo end date is required for NIH awards to be in compliance with NIH resource sharing policy. (type=value_error)

I'd be happy to set this value on the dandiset, but I don't know how to and it wasn't required to create the dandiset either

@yarikoptic
Copy link
Member

at the lower API we seems by default to allow for errors with ignore_errors: bool = True,
❯ git grep -A4 'def get_metadata'
dandi/cli/cmd_ls.py:def get_metadata_ls(
dandi/cli/cmd_ls.py-    path, keys, errors, flatten=False, schema=None, use_fake_digest=False
dandi/cli/cmd_ls.py-):
dandi/cli/cmd_ls.py-    from ..dandiset import Dandiset
dandi/cli/cmd_ls.py-    from ..metadata.nwb import get_metadata, nwb2asset
--
dandi/dandiapi.py:    def get_metadata(self) -> models.Dandiset:
dandi/dandiapi.py-        """
dandi/dandiapi.py-        Fetch the metadata for this version of the Dandiset as a
dandi/dandiapi.py-        `dandischema.models.Dandiset` instance
dandi/dandiapi.py-        """
--
dandi/dandiapi.py:    def get_metadata(self) -> models.Asset:
dandi/dandiapi.py-        """
dandi/dandiapi.py-        Fetch the metadata for the asset as a `dandischema.models.Asset`
dandi/dandiapi.py-        instance
dandi/dandiapi.py-        """
--
dandi/files/bases.py:    def get_metadata(
dandi/files/bases.py-        self,
dandi/files/bases.py-        digest: Digest | None = None,
dandi/files/bases.py-        ignore_errors: bool = True,
dandi/files/bases.py-    ) -> CommonModel:
--
dandi/files/bases.py:    def get_metadata(
dandi/files/bases.py-        self,
dandi/files/bases.py-        digest: Digest | None = None,
dandi/files/bases.py-        ignore_errors: bool = True,
dandi/files/bases.py-    ) -> DandisetMeta:
--
dandi/files/bases.py:    def get_metadata(
dandi/files/bases.py-        self,
dandi/files/bases.py-        digest: Digest | None = None,
dandi/files/bases.py-        ignore_errors: bool = True,
dandi/files/bases.py-    ) -> BareAsset:
--
dandi/files/bases.py:    def get_metadata(
dandi/files/bases.py-        self,
dandi/files/bases.py-        digest: Digest | None = None,
dandi/files/bases.py-        ignore_errors: bool = True,
dandi/files/bases.py-    ) -> BareAsset:
--
dandi/files/bases.py:    def get_metadata(
dandi/files/bases.py-        self,
dandi/files/bases.py-        digest: Digest | None = None,
dandi/files/bases.py-        ignore_errors: bool = True,
dandi/files/bases.py-    ) -> BareAsset:
--
dandi/files/bids.py:    def get_metadata(
dandi/files/bids.py-        self,
dandi/files/bids.py-        digest: Digest | None = None,
dandi/files/bids.py-        ignore_errors: bool = True,
dandi/files/bids.py-    ) -> BareAsset:
--
dandi/files/bids.py:    def get_metadata(
dandi/files/bids.py-        self,
dandi/files/bids.py-        digest: Digest | None = None,
dandi/files/bids.py-        ignore_errors: bool = True,
dandi/files/bids.py-    ) -> BareAsset:
--
dandi/files/bids.py:    def get_metadata(
dandi/files/bids.py-        self,
dandi/files/bids.py-        digest: Digest | None = None,
dandi/files/bids.py-        ignore_errors: bool = True,
dandi/files/bids.py-    ) -> BareAsset:
--
dandi/files/zarr.py:    def get_metadata(
dandi/files/zarr.py-        self,
dandi/files/zarr.py-        digest: Digest | None = None,
dandi/files/zarr.py-        ignore_errors: bool = True,
dandi/files/zarr.py-    ) -> BareAsset:
--
dandi/metadata/nwb.py:def get_metadata(
dandi/metadata/nwb.py-    path: str | Path | Readable, digest: Digest | None = None
dandi/metadata/nwb.py-) -> dict | None:
dandi/metadata/nwb.py-    """
dandi/metadata/nwb.py-    Get "flatdata" from a .nwb file

but here we do not give such an option. Shouldn't we make it uniform and return unvalidated ones here too, wdyt @satra @jwodder ? I feel that we had related discussion at some point but I fail to find it.

meanwhile

  • if you are just interested in record as a dict -- use get_raw_metadata, and you can convert it to unvalidated model "manually", e.g.
> ... python cody-1363.py
id='DANDI:000620/draft' schemaKey='Dandiset' schemaVersion='0.6.4' name='JazLab Processing' description='Work in progress dataset for the Jazayeri lab while prototyping the conversion and intermediate data processing such as spike sorting' contributor=[{'name': 'National Institutes of Health (NIH)', 'roleName': ['dcite:Funder'], 'schemaKey': 'Organization', 'awardNumber': '5 R01 NS 119519-03'}, {'name': 'Baker, Cody', 'email': 'cody.c.baker.phd@gmail.com', 'roleName': ['dcite:DataCurator', 'dcite:ContactPerson'], 'schemaKey': 'Person', 'affiliation': [], 'includeInCitation': True}] about=[] studyTarget=[] license=['spdx:CC-BY-4.0'] protocol=[] ethicsApproval=[] keywords=[] acknowledgement=None access=[{'status': 'dandi:EmbargoedAccess', 'schemaKey': 'AccessRequirements'}] url='https://dandiarchive.org/dandiset/000620/draft' repository='https://dandiarchive.org' relatedResource=[] wasGeneratedBy=[] identifier='DANDI:000620' dateCreated='2023-08-09T18:02:50.024110+00:00' dateModified=None citation='Baker, Cody (2023) JazLab Processing (Version draft) [Data set]. DANDI archive. https://dandiarchive.org/dandiset/000620/draft' assetsSummary={'species': [{'name': 'Macaca mulatta - Rhesus monkey', 'schemaKey': 'SpeciesType', 'identifier': 'http://purl.obolibrary.org/obo/NCBITaxon_9544'}], 'approach': [{'name': 'electrophysiological approach', 'schemaKey': 'ApproachType'}, {'name': 'behavioral approach', 'schemaKey': 'ApproachType'}], 'schemaKey': 'AssetsSummary', 'dataStandard': [{'name': 'Neurodata Without Borders (NWB)', 'schemaKey': 'StandardsType', 'identifier': 'RRID:SCR_015242'}], 'numberOfBytes': 289917799769, 'numberOfFiles': 5, 'numberOfSubjects': 3, 'variableMeasured': ['ProcessingModule', 'Units', 'ElectrodeGroup', 'SpatialSeries', 'ElectricalSeries'], 'measurementTechnique': [{'name': 'spike sorting technique', 'schemaKey': 'MeasurementTechniqueType'}, {'name': 'behavioral technique', 'schemaKey': 'MeasurementTechniqueType'}, {'name': 'analytical technique', 'schemaKey': 'MeasurementTechniqueType'}, {'name': 'surgical technique', 'schemaKey': 'MeasurementTechniqueType'}, {'name': 'multi electrode extracellular electrophysiology recording technique', 'schemaKey': 'MeasurementTechniqueType'}]} manifestLocation=['https://api.dandiarchive.org/api/dandisets/000620/versions/draft/assets/'] version='draft' @context='https://raw.githubusercontent.com/dandi/schema/master/releases/0.6.4/context.json'
❯ cat cody-1363.py
from dandi.dandiapi import DandiAPIClient
import os
from dandischema import models

client = DandiAPIClient(token=os.environ.get('DANDI_API_KEY'))  # need token for embargo

dandiset_620 = client.get_dandiset(dandiset_id="000620")  # embargoed

dandiset_620_metadata = models.Dandiset.unvalidated(**dandiset_620.get_raw_metadata())
print(dandiset_620_metadata)

@CodyCBakerPhD
Copy link
Contributor Author

Great, thanks for the advice!

@jwodder
Copy link
Member

jwodder commented Nov 21, 2023

@yarikoptic That ignore_errors option is for use when extracting data from an NWB file, not for validating API responses.

@satra
Copy link
Member

satra commented Nov 21, 2023

@yarikoptic - i would agree with @jwodder that we make all fields optional so that the models can be passed. and we add a separate requirements validation that needs to pass before assets/dandisets can be published. i.e. pull validation into a separate function that enforces the requirements. that would help in situations like this. however, this brings up a point that a user may have to check if a data object is valid to make assumptions about what exists in it. i would tune this either with the pydantic 2 layer or right after it.

@yarikoptic
Copy link
Member

I also agree with the fact @jwodder clarified ;) the question is either we should simplify people getting access to non-validated API responses (through a parameter to a call) or not (and just instruct to create those models "manually" out of raw responses, with v2 pydantic or before)

@yarikoptic
Copy link
Member

yarikoptic commented Nov 22, 2023

ah, here is the issue which @sneakers-the-rat reminded me about which points to the same problematic UX: #1205 -- people expect to just get metadata (valid or not) but get an error and get stuck, we need to come back and explain the vision that draft versions might still be not fulfilling the model etc.
So may be indeed it is just a matter of documentation: echoing the question in #1361, we should make our example(s) clear/er and outline

  • how to get only already released dandisets to have a guarantee of not crashing models
  • that using get_raw_metadata they can get records (not unvalidated pydantic models) on any dandisets
  • how to create unvalidated model

and/or alternatively: guide user by exception better, e.g. instead of just

ValidationError: 1 validation error for Dandiset
license
  ensure this value has at least 1 items (type=value_error.list.min_items; limit_value=1)

make it

ValidationError: 1 validation error for Dandiset
license
  ensure this value has at least 1 items (type=value_error.list.min_items; limit_value=1)

Note: that only published Dandisets and their assets models are expected to be valid.
Consider using 'get_raw_metadata' instead to get a metadata dictionary, and wrapping it as Dandiset.unvalidated(**obj.get_raw_metadata()) happen you want unvalidated model

or simply adding ignore_errors=False (to retain behavior but that would make it inconsistent with aforementioned use while loading from files) and advising to use it in that Note: or ignore_errors=True and thus not even needing any explanation in the exception.

WDYT?

edit: we could even make it ignore_errors=None and then if not set to explicit to a bool , in case of errors we issue a warning with the content of ValidationError and return unvalidated model. This way we would in effect inform user about problem, but do not crash and possibly give them a hint to skip such dandisets/assets or adjust ignore_errors to his desired bool value.

@jwodder
Copy link
Member

jwodder commented Nov 22, 2023

@yarikoptic

we need to come back and explain the vision that draft versions might still be not fulfilling the model

Long-term, once the overhaul of dandischema's models takes place, draft versions will fulfill a model.

@jwodder
Copy link
Member

jwodder commented Nov 22, 2023

@yarikoptic Regarding adding ignore_errors, what exactly should happen if ignore_errors is True but the data doesn't pass validation? Is the method supposed to return objects that don't even satisfy their own type annotations? That would be bad, and I would categorically oppose it.

@yarikoptic
Copy link
Member

I would like to get back to this one. Why do you think it would be that bad @jwodder ?
We now do have a bad situation that users are left without any clear way forward while simply asking for metadata -- get_metadata sounds like a safe thing to do/ask. But it blows into their face because of the reasons out of their control. That complicates writing scripts to consume such metadata records, in particular in cases where they might not even care about those fields which fail to pass validation. So why we cannot give them a way to overcome the hurdle in a consistent manner? it will be up to them to set ignore_errors=True and step on that "thin ice".

@satra
Copy link
Member

satra commented Jan 31, 2024

@yarikoptic - let's address dandi/dandi-schema#205 this now that pydantic has been upgraded. it will solve a bunch of issues, but we do need coordinated effort to make this work. i was hoping we have cli and archive updated with dandi-schema release before this happened.

@jwodder
Copy link
Member

jwodder commented Jan 31, 2024

@yarikoptic

  • There's already a clear way forwards for getting metadata: get_raw_metadata()
  • If a class (like one of our dandischema models) has type annotations, and a function returns an instance of that class that doesn't conform to those annotations, then the function is buggy & broken, plain & simple.

@yarikoptic
Copy link
Member

  • There's already a clear way forwards for getting metadata: get_raw_metadata()

ok, then let's at least make that exception more informative to guide user like I outlined above. Agree?

yarikoptic added a commit that referenced this issue Feb 1, 2024
Add notes to docs about `get_metadata()` vs. `get_raw_metadata()`
Copy link

github-actions bot commented Feb 2, 2024

🚀 Issue was released in 0.59.1 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants