Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON Schema creator and validator #10109

Merged
merged 45 commits into from
Dec 5, 2023
Merged

Conversation

sekmiller
Copy link
Contributor

@sekmiller sekmiller commented Nov 9, 2023

What this PR does / why we need it: Creates a custom json schema for a given Dataverse Collection that can be used to validate a dataset.json file prior to uploading to create the dataset

Which issue(s) this PR closes:

Closes #9464 Dataset json validation update code to use schema
Closes #9465 Create api.endpoint for Dataset json validation.
Closes #6978 Query Dataverse for mandatory metadata fields via API

Special notes for your reviewer: The api returns a json string that includes the escape characters. wasn't sure if that that is OK or if I should strip them out before returning

Suggestions on how to test this:

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?: Release notes are included

Additional documentation: doc updated to include the new endpoints

https://dataverse-guide--10109.org.readthedocs.build/en/10109/api/native-api.html#retrieve-a-dataset-json-schema-for-a-collection

@sekmiller sekmiller added this to the 6.1 milestone Nov 9, 2023
@sekmiller sekmiller marked this pull request as draft November 9, 2023 14:58
@sekmiller sekmiller self-assigned this Nov 9, 2023

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@sekmiller sekmiller marked this pull request as ready for review November 13, 2023 14:58
@sekmiller sekmiller removed their assignment Nov 13, 2023

This comment has been minimized.

@landreev landreev self-requested a review November 13, 2023 16:32
@landreev landreev self-assigned this Nov 13, 2023
@RequiredPermissions(Permission.AddDataset)
public class ValidateDatasetJsonCommand extends AbstractCommand<String> {

private static final Logger logger = Logger.getLogger(GetDatasetSchemaCommand.class.getCanonicalName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-and-paste from the other command, probably.

@landreev
Copy link
Contributor

I'm not seeing this discussed, in the pr or the linked issues - sorry if I missed it - but was there a specific reason to only use this validation code in the dedicated "validate" api? - and not always use it in the "create" api as well, before attempting to import the json uploaded by the user?
(are we just assuming that the attempt to import is validation enough, so there's no reason to spend extra cycles pre-validating? ... and if the user needs to know more details about why it failed, then they can run the validate api? - I may have answered my own question already, but please let me know what you think)

This comment has been minimized.

@JR-1991
Copy link
Contributor

JR-1991 commented Nov 30, 2023

I tested the build and endpoint, and everything went smoothly during installation/testing. The JSON schema I received was helpful, and I included a screenshot of the request and response for you to check out.

I noticed that controlled vocabularies are missing. For instance, the subject field in Citation has one, but it is missing here. It would be great to make this field an enum field to ensure consistent and accurate data entry. What do you think?

This comment has been minimized.

@pdurbin
Copy link
Member

pdurbin commented Nov 30, 2023

@JR-1991 are you thinking the values themselves (Chemistry, Law, etc.) should be returned in the JSON Schema?

That would certainly be more convenient than hitting another API to get them. Thanks again for adding that in this PR:

Also, I'm wondering if this PR closes the following issue:

It sounds so similar: "If this list [of five required fields] never changes, then RSpace could develop a solution where it reads a list of mandatory fields from a configuration file. But if it does change from time to time, it would be great if there was an API method in Dataverse to get a list of mandatory metadata fields .Then, a client could programmatically generate input fields for these properties so that the end-user could make a valid submission."

I'll go ask. 😄

This comment has been minimized.

@johannes-darms
Copy link
Contributor

I'm not sure if we should address this in this MR. However, IMHO its crucial that the json schema encodes the correct datatypes as defined within the metadatablock and the API returns data that is valid wrt. to the schema. So it would be nice if these two issues #9494. #9495 were somehow addressed.

@pdurbin
Copy link
Member

pdurbin commented Dec 1, 2023

crucial that the json schema encodes the correct datatypes as defined within the metadatablock

A good example of this is that the geospatial block expects floats for north, south, east, west (latitude and longitude). See this related pull request by @stevenwinship and @landreev merged yesterday:

This comment has been minimized.

1 similar comment

This comment has been minimized.

This comment has been minimized.

@JR-1991
Copy link
Contributor

JR-1991 commented Dec 5, 2023

@JR-1991 are you thinking the values themselves (Chemistry, Law, etc.) should be returned in the JSON Schema?

Sorry for missing your comment earlier. I believe this addition would be valuable and allow for client-side controlled vocabulary validation.

This comment has been minimized.

1 similar comment

This comment has been minimized.

Copy link

github-actions bot commented Dec 5, 2023

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:9464-schema-creator-validator
ghcr.io/gdcc/configbaker:9464-schema-creator-validator

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Size: 10 A percentage of a sprint. 7 hours.
Projects
None yet
8 participants