Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TG2-VALIDATION_POLYNOMIAL_CONSISTENT #101

Open
iDigBioBot opened this issue Jan 5, 2018 · 32 comments
Open

TG2-VALIDATION_POLYNOMIAL_CONSISTENT #101

iDigBioBot opened this issue Jan 5, 2018 · 32 comments
Labels
Consistency CORE TG2 CORE tests NAME Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2 Validation

Comments

@iDigBioBot
Copy link
Collaborator

iDigBioBot commented Jan 5, 2018

TestField Value
GUID 17f03f1f-f74d-40c0-8071-2927cfc9487b
Label VALIDATION_POLYNOMIAL_CONSISTENT
Description Is the polynomial represented in dwc:scientificName consistent with the equivalent values in dwc:genericName, dwc:specificEpithet, dwc:infraspecificEpithet?
TestType Validation
Darwin Core Class dwc:Taxon
Information Elements ActedUpon dwc:scientificName
dwc:genericName
dwc:specificEpithet
dwc:infraspecificEpithet
Information Elements Consulted
Expected Response INTERNAL_PREREQUISITES_NOT_MET if dwc:scientificName is bdq:Empty, or all of dwc:genericName, dwc:specificEpithet and dwc:infraspecificEpithet are bdq:Empty; COMPLIANT if the polynomial, as represented in dwc:scientificName, is consistent with bdq:NotEmpty values of dwc:genericName, dwc:specificEpithet, dwc:infraspecificEpithet; otherwise NOT_COMPLIANT.
Data Quality Dimension Consistency
Term-Actions POLYNOMIAL_CONSISTENT
Parameter(s)
Source Authority
Specification Last Updated 2023-09-18
Examples [dwc:scientificName="Hakea decurrens ssp. physocarpa", dwc:genericName="", dwc:specificEpithet="decurrens", dwc:infraspecificEpithet="physocarpa": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="Values of all non-empty atomic terms are found in the polynomial"]
[dwc:scientificName="Hakea decurrens", dwc:genericName="Hakea", dwc:specificEpithet="decurrens", dwc:infraspecificEpithet="physocarpa": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dwc:scientificName is inconsistent with atomic parts (dwc:genus, dwc:specificEpithet and dwc:infraspecificEpithet)"]
Source Paula Zermoglio
References
Example Implementations (Mechanisms) Kurator/FilteredPush sci_name_qc Library, FP-Akka
Link to Specification Source Code https://github.com/FilteredPush/sci_name_qc/blob/v1.1.2/src/main/java/org/filteredpush/qc/sciname/DwCSciNameDQ.java#L1554
Notes If dwc:specificEpithet is populated then this test expects that the value dwc:specificEpithet is the name of the second or species epithet of the scientificName. If dwc:genericName is populated, this test expects that the value of dwc:genus is the first word of the value of dwc:scientificName. If dwc:specificEpithet is populated then this test expects that the value dwc:specificEpithet is the name of the first or species epithet of the scientificName. If dwc:infraspecificEpithet is populated, then this test expects that the value of dwc:infraspecificEpithet is the name of the lowest or terminal infraspecific epithet of the scientificName, excluding any rank designation.
@ArthurChapman
Copy link
Collaborator

See Positive description - do we need to add "scientificNameAuthorship" to fields?

@iDigBioBot
Copy link
Collaborator Author

Comment by Arthur Chapman (@ArthurChapman) migrated from spreadsheet:
Variable name would need changing as this relates to the Positive side of the test rather than the negative. Also the Description appears for the (test - PASS) column (currently hidden)

@iDigBioBot
Copy link
Collaborator Author

Comment by Paul Morris (@chicoreus) migrated from spreadsheet:
@ac: Variable name is fine. The other validation variable names need to change. We must specify all of them as positive, not negative.

@iDigBioBot
Copy link
Collaborator Author

Comment by Arthur Chapman (@ArthurChapman) migrated from spreadsheet:
Whatever we do we need to be consistent

@ArthurChapman ArthurChapman changed the title TG2-VALIDATION_SCIENTIFICNAME_INCONSISTENT TG2-VALIDATION_POLYNOMIAL_INCONSISTENT Jan 18, 2018
@godfoder
Copy link
Contributor

img_20180118_150932

@ArthurChapman ArthurChapman added the Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT label Jan 18, 2018
@chicoreus
Copy link
Collaborator

We haven't addressed the point from @ArthurChapman that the authorship needs to be included, as scientificNameAuthorship may (incorrectly) differ from the authorship parsed out of scientificName.

@chicoreus
Copy link
Collaborator

This test shares a name with #45 and #46, but this test looks for consistency in the parts of the name in their various darwin core fields, while the other two tests currently only compare scientificName with a source authority.

@ArthurChapman
Copy link
Collaborator

This is another one that was originally called (pre-Gainesville) "TG2-VALIDATION_SCIENTIFICNAME_INCONSISTENT". I can't see my discussion on including Authorship @chicoreus - I believe Authorship may complicate things (as the many different spellings and inconsistencies) - I am thinking that is maybe why we changed the naming of these three to POLYNOMIAL from SCIENTIFICNAME - i.e. to basically exclude authorship in the Scientific Name.

@tucotuco
Copy link
Member

tucotuco commented Jun 25, 2020 via email

@ArthurChapman
Copy link
Collaborator

Trying to look at a test dataset for this test

At present we say "INTERNAL_PREREQUISITES_NOT_MET if all of the component terms are EMPTY"

but surely INTERNAL_PREREQUISITES_NOT_MET if dwc:scientificName is EMPTY and/or if dwc:genus is EMPTY.

dwc:infraspecificEpithet or specificEpithet on their own are not sufficient to be able to compare Scientific Name against genus, species, infraspecies.

If you have a scientificName and a genus (but no specificEpithet or infraspecificEpithet) then you can still compare

ArthurChapman added a commit that referenced this issue Oct 6, 2020
In accord with #189 added test data file for POLYNOMIAL_INCONSISTENT #101
@ArthurChapman
Copy link
Collaborator

ArthurChapman commented Oct 6, 2020

As a followup from my last comment - you may like to look at the DRAFT test data file I have created on my interpretation

https://github.com/tdwg/bdq/blob/master/tg2/core/testdata/testdata_POLYNOMIAL_INCOSISTENT_%23101.csv

@ArthurChapman
Copy link
Collaborator

Looking at #82 SCIENTIFICNAME_EMPTY overlaps with this one. If one was using a Workflow and #82 was run first and failed, then#101 would not need to be run. We seem to have a little redundancy here, but not sure how to fix. I see no problem in having both.

@chicoreus
Copy link
Collaborator

@ArthurChapman see the description of the logic in the notes. dwc:genus can be empty and dwc:specificEpithet can still be checked against dwc:scientificName for consistency.

See note in #82 these tests are along different axies in the framework, and test order is not specified, so some overlap is expected, especially in complex sets of interrelated terms like these.

@ArthurChapman
Copy link
Collaborator

@chicoreus OK - but that still means that INTERNAL_PREREQUISITES_NOT_MET if dwc:scientificName is EMPTY or if all of dwc:genus, dwc:specificEpithet or dwc:infraspecificEpithet are empty

OK - I didn't read the notes - I will change my tests data file to concur with the notes (once this discussion is finished) Interesting though, if we say that the test is COMPLIANT if you have a scientific name with Aus Bus Cus and the genus is empty and the species is empty but you have Cus in the infraspecific epithet. Would not logic say that it is NOT_COMPLIANT because the genus and species aren't compliant with the scientificName because they don't have values.

@Tasilee
Copy link
Collaborator

Tasilee commented Feb 10, 2022

In the light of recent discussions, I have added the specific dwc terms to the Expected Response.

@Tasilee
Copy link
Collaborator

Tasilee commented Mar 3, 2022

Examining test data, the following would return NOT_COMPLIANT when I think it should be INTERNAL_PREREQUISITES_NOT_MET

dwc:scientificName="", dwc:genus="Hakea", dwc:specificEpithet="decurrens", dwc:infraspecificEpithet="physocarpa"

??

@tucotuco
Copy link
Member

tucotuco commented Mar 3, 2022

Agreed.

@Tasilee
Copy link
Collaborator

Tasilee commented Mar 3, 2022

OK, so could we have a taxon guru adapt the Expected response? These epithet things scare me. Names scare me.

@ArthurChapman
Copy link
Collaborator

Agreed - probably needs rewording

INTERNAL_PREREQUISITES_NOT_MET if dwc:scientificName, and all of dwc:genus, dwc:specificEpithet and dwc:infraspecificEpithet are EMPTY; COMPLIANT if the polynomial, as represented in dwc:scientificName, is consistent with the atomic parts dwc:genus, dwc:specificEpithet, dwc:infraspecificEpithet; otherwise NOT_COMPLIANT

@Tasilee
Copy link
Collaborator

Tasilee commented Mar 3, 2022

Hmm, maybe

INTERNAL_PREREQUISITES_NOT_MET if dwc:scientificName is EMPTY, or all of dwc:genus, dwc:specificEpithet and dwc:infraspecificEpithet are EMPTY; COMPLIANT if the polynomial, as represented in dwc:scientificName, is consistent with the atomic parts dwc:genus, dwc:specificEpithet, dwc:infraspecificEpithet; otherwise NOT_COMPLIANT

@tucotuco
Copy link
Member

tucotuco commented Mar 4, 2022

+1 to what @Tasilee said.

@Tasilee
Copy link
Collaborator

Tasilee commented Mar 4, 2022

1 + @tucotuco is a majority :) CHANGED

@ArthurChapman
Copy link
Collaborator

Changed dwc:genus to dwc:genericName throughout this test in line with recent changes to Darwin Core.

@chicoreus
Copy link
Collaborator

As noted by @tucotuco the acceptance of https://dwc.tdwg.org/terms/#dwc:genericName resolves the potential ambiguity of dwc:genus with it's definition as the generic placement in the taxonomy from dwc:genericName as a parse of the first word of the scientific name.

@Tasilee Tasilee changed the title TG2-VALIDATION_POLYNOMIAL_INCONSISTENT TG2-VALIDATION_POLYNOMIAL_CONSISTENT Mar 22, 2022
@ArthurChapman
Copy link
Collaborator

In the Expected Response ..."COMPLIANT if the polynomial, as represented in dwc:scientificName, is consistent with the atomic parts dwc:genericName, dwc:specificEpithet, dwc:infraspecificEpithet" do we need the words "with the atomic parts"

Would not:

"COMPLIANT if the polynomial, as represented in dwc:scientificName, is consistent with dwc:genericName, dwc:specificEpithet, dwc:infraspecificEpithet"

be sufficient

@Tasilee
Copy link
Collaborator

Tasilee commented Apr 3, 2022

Are all happy with the specifications on this one now?

@Tasilee Tasilee removed the NEEDS WORK label Apr 3, 2022
chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Jun 9, 2022
…rrent TG2 Specifications. DESCRIPTION: Adding an implementation of VALIDATION_POLYNOMIAL_CONSISTENT along with unit test. Closing gbif parser instances to prevent resource leaks (need to move to centralized management of gbif parser for threading).
chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Aug 23, 2022
…ntation of tdwg/bdq#101 VALIDATION_POLYNOMIAL_CONSISTENT, also adding test cases from current validation data csv file that were failing, along with commented out case that may be in error in the validation data. Conforming implementation of tdwg/bdq#121 VALIDATION_TAXONID_COMPLETE to current specification for handling empty taxonID, also adding test cases from current validation data csv file that were failing.  Fixing methods that should be static but aren't.
@Tasilee
Copy link
Collaborator

Tasilee commented Sep 12, 2022

Getting this 'on the record' for all to consider: Email with @chicoreus yesterday. I suggested for the Expected Response-

INTERNAL_PREREQUISITES_NOT_MET if dwc:scientificName is EMPTY, or all of dwc:genericName, dwc:specificEpithet and dwc:infraspecificEpithet are EMPTY; COMPLIANT if the polynomial, as represented in dwc:scientificName, is consistent with NOT_EMPTY values of dwc:genericName, dwc:specificEpithet, dwc:infraspecificEpithet; otherwise NOT_COMPLIANT.

@chicoreus response: "That is more explicit that the current separate (and not formalized yet) general guidance on handling "consistent". But if we we are explicit in this way here, we may need to be in other tests invoking "consistent"."

Thoughts?

@ArthurChapman
Copy link
Collaborator

I like it, but not sure of other implications

@Tasilee
Copy link
Collaborator

Tasilee commented Nov 7, 2022

After discussion on the Zoom today, we agreed that using the current Test Data format for examples would seem expedient. We also previously agreed that a "COMPLIANT" and "NOT_COMPLIANT" or equivalents was appropriate.

I think the examples of INTERNAL/EXTERNAL_PREREQUISITES_NOT_MET would be overkill here?

What I have added in Examples is a for a check on formatting.

@Archilegt
Copy link

I don't see how the test can accommodate for interpolated names part of a polynomial dwc:scientificName.
Polynomials with interpolated names:
Aus (Bus) cus, where Bus is a subgenus
Aus (cus) dus, where cus is a superspecies

@Tasilee
Copy link
Collaborator

Tasilee commented Nov 13, 2022

@chicoreus ? We need your expertise on this question. This may also need to be another general principle?

chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Jun 26, 2023
…tdwg/bdq specifications. Updated metadata (added ProvidesVersion and Specification) for tdwg/bdq#77 tdwg/bdq#83 tdwg/bdq#28 tdwg/bdq#101 VALIDATION_POLYNOMIAL_CONSISTENT.  Updated metadata. removing reviewed stub method. Added test cases to conirm expected NOT_EMPTY comparison behavior.
chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Jun 27, 2023
…ISTENT revealed from investigation of failure case for validation dataID 125. Fixing handling of comparisons with isEmpty rather than comparing to empty string, adding handling of parser returing a uninomial.
@Tasilee
Copy link
Collaborator

Tasilee commented Sep 18, 2023

Splitting bdqffdq:Information Elements into "Information Elements ActedUpon" and "Information Elements Consulted".

Also changed "Field" to "TestField", "Output Type" to "TestType" and updated "Specification Last Updated"

@chicoreus chicoreus added the CORE TG2 CORE tests label Sep 18, 2023
chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Jul 18, 2024
…h current specification, adding a couple of simpler regex cases for evaluation without invoking parser, adding unit test cases matching examples in the issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Consistency CORE TG2 CORE tests NAME Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2 Validation
Projects
None yet
Development

No branches or pull requests

7 participants