Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix invalid contributors causing crash #1771

Merged
merged 2 commits into from
Dec 1, 2023

Conversation

mvandenburgh
Copy link
Member

@mvandenburgh mvandenburgh commented Dec 1, 2023

Currently, we assume that Version.metadata['contributors'] is a list of JSON objects. This holds true for any metadata uploaded by the DANDI CLI, but there's nothing preventing users from uploading arbitrary JSON. Invalid metadata will be marked as invalid, but there's some places in the code where we assume it's valid anyway which results in a crash. This PR adds some additional checks to those places to avoid outright crashes.

Closes #1769

Currently, we assume that `Version.metadata['contributors']` is a list
of JSON objects. This holds true for any metadata uploaded by the DANDI
CLI, but there's nothing preventing users from uploading arbitrary JSON.
Invalid JSON will be marked as invalid, but there's some places in the
code where we assume it's valid anyway which results in a crash.
@mvandenburgh mvandenburgh added patch Increment the patch version when merged release Create a release when this pr is merged labels Dec 1, 2023
Copy link
Member

@jjnesbitt jjnesbitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me and makes sense. My only question is, is the only place in our application where these assumptions are made? If you feel confident the answer is yes then I think this is good. If you/we are unsure about that, I still think it's worth merging this now to fix the current critical bug, but we'd want to make sure to follow up.

@mvandenburgh
Copy link
Member Author

My only question is, is the only place in our application where these assumptions are made? If you feel confident the answer is yes then I think this is good.

I'm not confident that this is the only place. I initially only noticed the issue with extract_contact_person, and the test revealed the citation issue. So yeah we should definitely follow up on this.

@mvandenburgh mvandenburgh merged commit 85d8b36 into master Dec 1, 2023
10 checks passed
@mvandenburgh mvandenburgh deleted the fix-contact-person-parsing branch December 1, 2023 18:15
@dandibot
Copy link
Member

dandibot commented Dec 1, 2023

🚀 PR was released in v0.3.66 🚀

@dandibot dandibot added the released This issue/pull request has been released. label Dec 1, 2023
@waxlamp
Copy link
Member

waxlamp commented Dec 5, 2023

Dropping a thought here so it's recorded somewhere: this seems like further evidence for a need for different schemata for "draft" and "published" phases of things. In this case, we choose not to validate anything at all until we're ready for publishing, but in reality we shouldn't allowed broken formats for things like collaborators, while we should allow, e.g., missing values for certain required-at-publishing-time values (like, e.g., license). This is something that should be codified in different schemata that are applied at different times. Then, we might be able to catch people sending bad metadata etc.

@bendichter
Copy link
Member

I second this. I think all contributions to the archive should be validated against an (appropriately weakened) schema and a simple error like using a dict instead of a list for the contributors field should result in a error response with an appropriate error message. It will be way easier for us if we have some semblance of order for data we receive instead of trying to handle all edge cases in query and display logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
patch Increment the patch version when merged release Create a release when this pr is merged released This issue/pull request has been released.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect Metadata Specification Broke dandiset/my Request
5 participants