Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for references to core IDs that do not exist #1246

Open
kbraak opened this issue Dec 15, 2015 · 13 comments
Open

Check for references to core IDs that do not exist #1246

kbraak opened this issue Dec 15, 2015 · 13 comments

Comments

@kbraak
Copy link
Contributor

kbraak commented Dec 15, 2015

The IPT should validate that all the core ID used in the extension(s) references a core ID that exists.

Please note, this check can currently be performed by http://tools.gbif.org/dwca-validator/

@kbraak
Copy link
Contributor Author

kbraak commented Apr 14, 2016

Related issue in DwC-A validator: http://dev.gbif.org/issues/browse/TOOL-7

@kbraak
Copy link
Contributor Author

kbraak commented Mar 13, 2017

@cgendreau I anticipate the IPT will do referential integrity checks on DwC-As by making external calls to the GBIF Data Validator API.

For large datasets, it may take hours for the data validator to finish. Therefore, instead of the user having to wait for the results, how about they can have them sent directly to their email? Of course they would have to provide an email in their API request for this to work. Thanks.

@cgendreau
Copy link

There is no plan to send a response by e-mail at the moment. But if we were to do that it is very likely that we would use the GBIF login instead.

Running the validation on large dataset won't take hours if we do not interpret all records.

@CecSve
Copy link
Contributor

CecSve commented Dec 22, 2022

Relevant issue on portal feedback.

@mike-podolskiy90
Copy link
Contributor

@CecSve Thank you for the comment.
That's going to be a pretty expensive check. We can probably consider validating references for relatively small datasets

@CecSve
Copy link
Contributor

CecSve commented Dec 23, 2022

@mike-podolskiy90 it seems like it is the scope of the new data model though (point 5)?

#1736 (comment)

@mike-podolskiy90
Copy link
Contributor

Yes, but that is frictionless data package and those checks will be performed by the frictionless library itself

@CecSve
Copy link
Contributor

CecSve commented Dec 23, 2022

Yes, but that is frictionless data package and those checks will be performed by the frictionless library itself

Would that mean that the publisher would not get any notification similar to the messages they receive when publishing currently?

@mike-podolskiy90
Copy link
Contributor

No. Data package would not be generated, and validation errors would be displayed.

@CecSve
Copy link
Contributor

CecSve commented Mar 9, 2023

No. Data package would not be generated, and validation errors would be displayed.

Would the checks and validation errors only be for publishers using the frictionless packages? Or is it planned to also have such checks for regular DwC archives?

@mike-podolskiy90
Copy link
Contributor

It is not planned

@CecSve
Copy link
Contributor

CecSve commented Mar 9, 2023

Ok. I will not make a new issue as the origin of this issue is capturing what I would suggest.

Ideally, the IPT should validate referential integrity of DwC-A's to capture mismappings and potentially stop the generation of an archive if the issues are not fixed by the publisher. Relevant issues for inclusion of referential integrity checks are:

gbif/portal-feedback#4522
gbif/portal-feedback#4491
gbif/portal-feedback#3766

@ManonGros please add to this if I am missing something

@ManonGros
Copy link
Contributor

Another issue related to referential integrity: gbif/portal-feedback#5359 (comment)

@mike-podolskiy90 mike-podolskiy90 modified the milestones: 2.x, 3.x Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants