Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TG2-VALIDATION_GENUS_NOTEMPTY #214

Closed
Tasilee opened this issue Jan 28, 2024 · 22 comments
Closed

TG2-VALIDATION_GENUS_NOTEMPTY #214

Tasilee opened this issue Jan 28, 2024 · 22 comments
Labels
Completeness NAME Supplementary Tests supplementary to the core test suite. These are tests that the team regarded as not CORE. Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2 Validation

Comments

@Tasilee
Copy link
Collaborator

Tasilee commented Jan 28, 2024

TestField Value
GUID d02c1ffd-af28-49bd-9c9c-e8e23a8b7258
Label VALIDATION_GENUS_NOTEMPTY
Description Is there a value in dwc:genus?
TestType Validation
Darwin Core Class dwc:Taxon
Information Elements ActedUpon dwc:genus
Information Elements Consulted dwc:taxonRank
Expected Response INTERNAL_PREREQUISITES_NOT_MET if dwc:genus is bdq:Empty and dwc:taxonRank contains a value that is not interpretable as a taxon rank; COMPLIANT if dwc:genus is bdq:NotEmpty, or dwc:genus is bdq:Empty and the value in dwc:taxonRank is higher than genus; otherwise NOT_COMPLIANT.
Data Quality Dimension Completeness
Term-Actions GENUS_NOTEMPTY
Parameter(s)
Source Authority
Specification Last Updated 2024-06-05
Examples [dwc:genus="genus": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:genus is bdq:NotEmpty"]
[dwc:genus="": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dwc:genus is bdq:Empty"]
Source TG2
References
Example Implementations (Mechanisms) Kurator/FilteredPush sci_name_qc Library
Link to Specification Source Code https://github.com/FilteredPush/sci_name_qc/blob/v1.1.2/src/main/java/org/filteredpush/qc/sciname/DwCSciNameDQ.java#L3558
Notes Genus is expected to be bdq:Empty when an identification is only to the level of a taxon higher than Genus. This test is not regarded as CORE (cf. bdq:CORE).
@Tasilee Tasilee added TG2 NAME Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT Supplementary Tests supplementary to the core test suite. These are tests that the team regarded as not CORE. Completeness Validation labels Jan 28, 2024
@Tasilee Tasilee closed this as completed Feb 4, 2024
@chicoreus chicoreus changed the title TG2-VALIDATION-GENUS-NOTEMPTY TG2-VALIDATION_GENUS_NOTEMPTY Feb 15, 2024
@chicoreus
Copy link
Collaborator

chicoreus commented Feb 15, 2024

All of #206, #207, #208. #213, #214, #215, #217, #218, #219, and #220 need to consider additional information about whether an identification is at a rank above that of the term under test and if so the term is correctly empty.

Suggest for all of these:

(1) add dwc:taxonRank as an information element consulted.

(2) rewrite the test specifications in the form:

COMPLIANT if the value in dwc:taxonRank is of a rank higher than genus or if dwc:genus is not EMPTY; otherwise NOT_COMPLIANT

Without such a change, these tests have limited power to identify data that has quality, with this being worse the lower the rank of the term under test is.

@chicoreus chicoreus reopened this Feb 15, 2024
@chicoreus
Copy link
Collaborator

Removed #216 from the list, dwc:kingdom doesn't need examination of another term.

@chicoreus
Copy link
Collaborator

Noting that #265 is a similar, but more complex problem.

@Tasilee
Copy link
Collaborator Author

Tasilee commented Feb 19, 2024

Changing the ER to

COMPLIANT if the value in dwc:taxonRank is of a rank higher than genus or if dwc:genus is not EMPTY; otherwise NOT_COMPLIANT

is similar to #256 in that IF dwc:taxonRank is 'higher than genus', then dwc:genus is effectively ignored as the Information Element Acted Upon, and you would be implying dwc:genus is not EMPTY, regardless.

If this test is considered useful in some context (use case), then I would suggest maybe (not sure this is right taxonomically)

POTENTIAL_ISSUE if dwc:genus is EMPTY and dwc:taxonRank is lower than "family"; otherwise NOT_ISSUE

?

@ArthurChapman
Copy link
Collaborator

@Tasilee - I would keep it Supplementary with your top wording. An ISSUE test would be a separate test and I don't think worth considering at this time.

@ArthurChapman
Copy link
Collaborator

@chicoreus is the wording given by @Tasilee for an ISSUE worth making a test for this?

@chicoreus
Copy link
Collaborator

@ArthurChapman given that there are darwin core terms for ranks lower than family and higher than genus, and taxonomic ranks that fall between the two, I think that, as phrased, @Tasilee 's issue would be difficult to phrase and implement.

Current phrasing looks good.

@Tasilee Tasilee added DO NOT IMPLEMENT A potential test that it is not recommended be implemented and removed NEEDS WORK Supplementary Tests supplementary to the core test suite. These are tests that the team regarded as not CORE. labels Apr 15, 2024
@ArthurChapman
Copy link
Collaborator

Made DO NOT IMPLEMENT as this test doesn't imply an aspect of Data Quality as it is redundant when compared with dwc:scientificName. It should probably be better testing dwc:genericName rather than dwc:genus. i.e. VALIDATION_GENERICNAME_NOTEMPTY.

@chicoreus
Copy link
Collaborator

DO NOT IMPLEMEMENT as this test is an artifact of our thinking prior to the formulation of dwc:genericName, its development came from dwc:genus being widely used (incorrectly) as a parse of the generic portion of scientific name, and thus it was being considered part of the suite of tests related to scientific name parsing. With the clearer separation between dwc:genus as part of the classification and dwc:genericName as a parse of the scientific name, this thinking is no longer relevant. So, the current classification at the generic level of an occurrence has very little utility for assessing data quality of the occurrence data. It may have some value in very narrow cases for evaluation of taxonomic data sets, but even then entries may be of higher taxa, and be expected to not have a classification at the level of genus, so without clear explication of a use case and the potential pitfalls of implementation, we are recommending this as DO NOT IMPLEMENT.

@chicoreus
Copy link
Collaborator

Our treating this as DO NOT IMPLEMEMENT was based on the incorrect belief that it stood in isolation. It is one of a family of supplemental tests that examine emptyness of higher classification terms. With the current inclusion of dwc:rank as an information element it is actually a good representative for that entire family. Should probably be considered supplemental, and the set of supplemental tests listed above brought into conformance with it

@chicoreus chicoreus reopened this Apr 16, 2024
@Tasilee
Copy link
Collaborator Author

Tasilee commented Apr 16, 2024

OK, but is dwc:genus a special case (as @ArthurChapman suggested in the Zoom 16th April 2024)?

@chicoreus
Copy link
Collaborator

chicoreus commented Apr 16, 2024 via email

@ArthurChapman
Copy link
Collaborator

@Tasilee and I have been looking at this test and associated tests #206, #207, #208 which are all "FOUND" tests and #213, #215, #216, #217, #218, #219 and #220 which are all NOT EMPTY" tests. We think that this test should be kept simple like #213, etc. and be simple NOTEMPTY/EMPTY tests and that we shouldn't make them more complicated by adding in dwc:taxonRank as a consulted Element which would need altering Expected Response to include EXTERNAL_PREREQUISITES_NOTMET and INTERNAL_PREREQUISITES_NOTMET etc. and adding Source Authorities, etc. It would also mean having to alter all #213-#220 tests. They should remain SUPPLEMENTARY.

When thinking about how people may use these types of tests if they wish to implement them - most will just want to know if the field is EMPTY or not. If we make the tests much more complicated then people probably won't implement them. If we wanted to go more complicated, I think we would need to keep the "NOTEMPTY" tests, but add a new set of "FOUND" tests, something I think would be unnecessary use of our time at this stage.

With respect to this test - I think it should be altered to the same as #213, etc. and keep it simple and SUPPLEMENTARY

@chicoreus
Copy link
Collaborator

chicoreus commented May 13, 2024 via email

@ArthurChapman
Copy link
Collaborator

Okay @chicoreus following your logic, does the following satisfy your requirements (of course we would need to add something in sourceAuthority).

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:taxonRank is EMPTY or is at a higher rank than Genus; COMPLIANT if dwc:genus is not EMPTY; otherwise NOT_COMPLIANT

The wording now saying ..."COMPLIANT if the value in dwc:taxonRank is higher than genus or if dwc:genus is not EMPTY; otherwise NOT_COMPLIANT" is not logical - because it is saying that if taxonRank is Family and Genus is EMPTY - that the test is COMPLIANT although GENUS is EMPTY so makes no logical sense for a test for NOTEMPTY. The way I have suggested above makes logical sense.

@Tasilee
Copy link
Collaborator Author

Tasilee commented May 19, 2024

@chicoreus ??

@chicoreus
Copy link
Collaborator

@ArthurChapman I think the family of NOT_EMPTY tests for higher taxon rank terms (#213, #215, #216, #217, #218, #219 and #220 and this one, should follow the same pattern:

COMPLIANT if the value in dwc:taxonRank is higher than genus or if dwc:genus is not EMPTY; INTERNAL_PREREQUISITES_NOT_MET if dwc:genus is EMPTY, dwc:taxonRank is NOT_EMPTY, and dwc:taxonRank contains a value that is not interpretable as a taxon rank; otherwise NOT_COMPLIANT.

This asserts that the data have quality if dwc:genus contains a value, or if dwc:genus correctly does not contain a value, it handles a case were dwc:genus does not contain a value, and it isn't possible to tell if it should or not, and marks data where dwc:genus incorrectly lacks a value as not having quality.

I don't think a reference to a source authority is needed, as taxonRank can be assessed without reference to a source authority for the purposes of this test, if this isn't the case, then:

COMPLIANT if the value in dwc:taxonRank is higher than genus or if dwc:genus is not EMPTY; INTERNAL_PREREQUISITES_NOT_MET if dwc:genus is EMPTY, dwc:taxonRank is NOT_EMPTY, and dwc:taxonRank contains a value that is not interpretable as a taxon rank; EXTERNAL_PREREQUISITES_NOT_MET if dwc:genus does not contain a value, dwc:taxonRank contains a value and the sourceAuthority is needed and not available to interpret whether dwc:taxonRank has a rank higher than genus; otherwise NOT_COMPLIANT.

Key point is that data can have quality, and be COMPLIANT even if dwc:genus does not contain a value in those cases when dwc:genus should not contain a value. This isn't a simple family of tests for emptyness.

@Tasilee
Copy link
Collaborator Author

Tasilee commented May 30, 2024

Thanks @chicoreus. I agree about Source Authority but I'm inclined to align with the structure we have been using and simplifying it -

INTERNAL_PREREQUISITES_NOT_MET if dwc:genus is EMPTY or dwc:taxonRank contains a value that is not interpretable as a taxon rank; COMPLIANT if the value in dwc:taxonRank is higher than genus or if dwc:genus is not EMPTY; otherwise NOT_COMPLIANT.

Is that ok?

@Tasilee
Copy link
Collaborator Author

Tasilee commented Jun 3, 2024

Changed Expected Response from

COMPLIANT if the value in dwc:taxonRank is higher than genus or if dwc:genus is not EMPTY; otherwise NOT_COMPLIANT |

to

INTERNAL_PREREQUISITES_NOT_MET if dwc:genus is EMPTY or dwc:taxonRank contains a value that is not interpretable as a taxon rank; COMPLIANT if the value in dwc:taxonRank is higher than genus or if dwc:genus is not EMPTY; otherwise NOT_COMPLIANT.

@Tasilee Tasilee added Supplementary Tests supplementary to the core test suite. These are tests that the team regarded as not CORE. and removed NEEDS WORK DO NOT IMPLEMENT A potential test that it is not recommended be implemented labels Jun 3, 2024
@chicoreus
Copy link
Collaborator

chicoreus commented Jun 3, 2024 via email

@ArthurChapman
Copy link
Collaborator

That seems to work @chicoreus

@Tasilee
Copy link
Collaborator Author

Tasilee commented Jun 4, 2024

OK, thanks @chicoreus. Changed Expected Response from

INTERNAL_PREREQUISITES_NOT_MET if dwc:genus is EMPTY or dwc:taxonRank contains a value that is not interpretable as a taxon rank; COMPLIANT if the value in dwc:taxonRank is higher than genus or if dwc:genus is not EMPTY; otherwise NOT_COMPLIANT.

to

INTERNAL_PREREQUISITES_NOT_MET if dwc:genus is EMPTY and dwc:taxonRank contains a value that is not interpretable as a taxon rank; COMPLIANT if dwc:genus is not EMPTY, or dwc:genus is EMPTY and the value in dwc:taxonRank is higher than genus; otherwise NOT_COMPLIANT.

chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Jul 17, 2024
…tdwg/bdq#215 tdwg/bdq#217 and tdwg/bdq#218 not empty tests for higher ranks below kingdom, including utility method to evaluate ordering of pairs of rank values and unit tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Completeness NAME Supplementary Tests supplementary to the core test suite. These are tests that the team regarded as not CORE. Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2 Validation
Projects
None yet
Development

No branches or pull requests

3 participants