-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TG2-VALIDATION_CLASSIFICATION_CONSISTENT #123
Comments
Discussed at TDWG 2018 DQIG meeting that there are two distinct potential causes for ambiguity. One potential cause is as in the original example, where an incorrect name somewhere in the higher classification terms throws doubt on what is correct and what is not. The other potential cause is a name (or combination) in the given values and ranks that matches more than one combination in the target authority. We are looking for an example of this. Perhaps a family-level homonym? |
Should we incorporate some of the @tucotuco comment above into the Notes? |
Most definitely, once we have an example of a multiple match. |
Someone at TDWG mentioned that there was only one homonym at the family level or higher. Not sure what it is - but it would surprise me if there was only one. |
@ArthurChapman it would certainly surprise me as well! Taxonomists are devious. |
You might find this fascinating...
http://www.marine.csiro.au/mirrorsearch/ir_search.list_homonyms?hlevel=family
…On Wed, Sep 5, 2018 at 6:56 PM Lee Belbin ***@***.***> wrote:
@ArthurChapman <https://github.com/ArthurChapman> it would certainly
surprise me as well! Taxonomists are devious.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#123 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAcP66ylLv-jjdbYSRyxCiGIc3NkqqPEks5uX3WdgaJpZM4RjMmL>
.
|
Thanks John - I knew there had to be more than one
I will find a good one to add as an example in #123
Cheers
Arthur
…On 6/09/2018 6:04 AM, John Wieczorek wrote:
You might find this fascinating...
http://www.marine.csiro.au/mirrorsearch/ir_search.list_homonyms?hlevel=family
On Wed, Sep 5, 2018 at 6:56 PM Lee Belbin ***@***.***>
wrote:
> @ArthurChapman <https://github.com/ArthurChapman> it would certainly
> surprise me as well! Taxonomists are devious.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#123 (comment)>, or mute
> the thread
>
<https://github.com/notifications/unsubscribe-auth/AAcP66ylLv-jjdbYSRyxCiGIc3NkqqPEks5uX3WdgaJpZM4RjMmL>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#123 (comment)>, or
mute the thread
<https://github.com/notifications/unsubscribe-auth/AVx404nYuF_GHvtExNnTXxjJvF9G5Chtks5uYC40gaJpZM4RjMmL>.
--
----------------------
Arthur Chapman
(Australian Biodiversity Information Services)
PO Box 35
Ballan Vic 3342
Australia
+61 (0)400 400 326
|
Thanks @tucotuco. I had forgotten how useful Tony Rees' IRMNG is. |
Perhaps not the place for this, but it's the first example I looked at. Are tests that depend on an external authority and thus not computable without reference to that authority grouped in some way or identified as such? For example, given the test data testdata_VALIDATION_CLASSIFICATION_AMBIGUOUS_#123.csv I can not write code to perform (all) the tests unless I resolve a request against bdq:sourceAuthority. This seems to represent a class of tests that will change results, potentially, when the authority changes (as opposed to the data in the CSV), and thus much more difficult to implement consistently? |
@mjy Your observation is correct. There isn't currently a label for the tests that require a source authority, but that seems like a useful label to add. All of the tests that do require a source authority should have the label "Parametrized", but not all of the tests with the label "Parametrized" necessarily require a source authority. I would think that the best way forward on tests of this nature is to use values from the source authority expected to remain highly stable. |
I think we have a "Vocabulary" label - not sure we have been consistent with it, I'd have to check. |
@ArthurChapman is right most of the time in saying when bdq:sourceAuthority is a Parameter, the VOCABULARY tag is present. There are 25 tests that have the Parameter "bdq:sourceAuthority" and all but four have a VOCABULARY tag. The four are All of these have the tag "ISO/DCMI standard" (and there are 13 tests that have that tag). In reviewing the tests, maybe we do have an anomaly or two. Take #48: It has the ISO/DCMI STANDARD" tag, but no "bdq:sourceAuthority" as there is only ONE, so it does not have "Parameterized" nor does it have "VOCABULARY", even though there is one. @mjy 's view from a developer's perspective is less subtle than our reasoning? Is there is a case for a) removing the "ISO/DCMI STANDARD" tag, b) including a "bdq:sourceAuthority" and c) if relevant, including a "VOCABULARY" tag when there is one? Or maybe just adding a new tag "EXTERNAL SOURCE" or equivalent wherever there is a need to refer to external sources? Thoughts? |
Hi Lee,
To me, ISO 3166-1-alpha-2 is a vocabulary, so it should have the Vocabulary
tag. As for the ISO/DCMI label, that is entirely for our convenience. It
might be convenient to add labels for the specific sources we are
recommended so all of the associated tests can be called up at once. Issue
#50 and Issue #73 actually requires two authorities, one for the country
codes and one for the country shapes (or the service that
reconciles between two incorporated authorities). #76 really doesn't use a
vocabulary, but the test depends on an ISO standard being followed. It
seems like Issue #48 can't avoid using an authority, so not sure why it
wouldn't say so.
Cheers,
John
…On Mon, Feb 22, 2021 at 6:58 PM Lee Belbin ***@***.***> wrote:
@ArthurChapman <https://github.com/ArthurChapman> is right most of the
time in saying when bdq:sourceAuthority is a Parameter, the VOCABULARY tag
is present. There are 25 tests that have the Parameter
"bdq:sourceAuthority" and all but four have a VOCABULARY tag. The four are
#50 <#50>
#62 <#62>
#73 <#73>
#76 <#76>
All of these have the tag "ISO/DCMI standard" (and there are 13 tests that
have that tag). In reviewing the tests, maybe we do have an anomaly or two.
Take #48 <#48>: It has the ISO/DCMI
STANDARD" tag, but no "bdq:sourceAuthority" as there is only ONE, so it
does not have "Parameterized" nor does it have "VOCABULARY", even though
there is one.
@mjy <https://github.com/mjy> 's view from a developer's perspective is
less subtle than our reasoning? Is there is a case for a) removing the
"ISO/DCMI STANDARD" tag, b) including a "bdq:sourceAuthority" and c) if
relevant, including a "VOCABULARY" tag when there is one? Or maybe just
adding a new tag "EXTERNAL SOURCE" or equivalent wherever there is a need
to refer to external sources?
Thoughts?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#123 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADQ723QLVIYKJX3GUGSGODTALHOZANCNFSM4EMMZGFQ>
.
|
Thanks @tucotuco. I agree with you that ISO 3166 'tests' should have the VOCABULARY, and I have added those tags where missing. #50 has a Parameter "bdq:sourceAuthority", and the Notes (as per our way of documenting), has "[bdq:sourceAuthority default for country shapes = spatial UNION of terrestrial boundaries from gadm.org and EEZs from marineregions.org", but only a Reference to the codes (ISO 3166...). This does seem anomalous. Do we add a second Parameter "bdq:sourceAuthority2" and then assign the default it in the notes? Ditto #73. #76 - I agree that it doesn't use a vocabulary but it seems here is where a Reference is appropriate and not bdq:sourcAuthority as the test doesn't specifically look up something? #46 - I agree and have added "bdq:sourceAuthority" to Parameters. This looks like an omission. We keep finding such things :| |
Yes, #50 <#50> and #73
<#73> are interesting. We could argue
that only one bdq:sourceAuthority is required (and what a relief), because
we don't actually need to consult a countryCode source here. Whatever is in
the countryCode field will be used to make the lookup in the one
bdq:sourceAuthority, which is the spatial combination one. Does that hold
water? (marine pun, and not a very good one)
…On Sat, Feb 27, 2021 at 9:47 PM Lee Belbin ***@***.***> wrote:
Thanks @tucotuco <https://github.com/tucotuco>. I agree with you that ISO
3166 'tests' should have the VOCABULARY, and I have added those tags where
missing.
#50 <#50> has a Parameter
"bdq:sourceAuthority", and the Notes (as per our way of documenting), has
"[bdq:sourceAuthority default for country shapes = spatial UNION of
terrestrial boundaries from gadm.org and EEZs from marineregions.org",
but only a Reference to the codes (ISO 3166...). This does seem anomalous.
Do we add a second Parameter "bdq:sourceAuthority2" and then assign the
default it in the notes?
Ditto #73 <#73>.
#76 <#76> - I agree that it doesn't use
a vocabulary but it seems here is where a Reference is appropriate and not
bdq:sourcAuthority as the test doesn't specifically look up something?
#46 <#46> - I agree and have added
"bdq:sourceAuthority" to Parameters. This looks like an omission. We keep
finding such things :|
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#123 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADQ724TMZEV3KCAJUDDUVLTBGHC3ANCNFSM4EMMZGFQ>
.
|
I believe this test requires the elements dwc:subfamily and dwc:genericName. These are new since the test was first formulated and no update included them. |
…ASSIFICATION_CONSISTENT with GBIF and WoRMS authorities. Includes minimal integration test.
…g/bdq#123 VALIDATION_CLASSIFICATION_CONSISTENT. Needs more work, not passing all validation cases. Added line number to log4j test configuration.
Thanks @tucotuco - added but is dwc:genericName classed as a "higher classification taxonomic term"? |
78640f09-8353-411a-800e-9b6d498fb1c9 duplicates #95 replacing with 2750c040-1d4a-4149-99fe-0512785f2d5f |
@Tasilee good catch. The information elements should include dwc:genus, but not dwc:genericName, as dwc:genericName is a parse of the generic name portion of dwc:scientificName, not the placement of the taxon in the classification. |
Yes, good catch, my bad. |
…STENT to match parentage of higher taxa in the source authority with their parentage in the presented data, including matching on synonyms. Added SciNameUtils.isSameClassificationInAuthority() to check parentage against authority, along with BooleanWithComment to carry both the result and a comment from this check. Modified SciNameUtils.sameOrSynonym to check name as synonym of otherName and otherName as synonym of name.
Restructured Parameter(s) and Source authority |
Will need to include the new terms dwc:superfamily, dwc:tribe, dwc:subtribe tdwg/dwc#65 tdwg/dwc#45 tdwg/dwc#46 |
Added the terms dwc:superfamily, dwc:tribe, dwc:subtribe to the Information elements and Expected response, and updated Specification Last Updated. |
Amended Source Authority values to align with @chicoreus syntax From bdq:sourceAuthority default = "GBIF Backbone Taxonomy" [https://doi.org/10.15468/39omei] | to bdq:sourceAuthority default = "GBIF Backbone Taxonomy" {[https://doi.org/10.15468/39omei]} {API endpoint [https://api.gbif.org/v1/species?datasetKey=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c&name=]} |
Minor update to specification, changed one instance of genericName to be the expected classification term genus. |
…ifications. Addressed tdwg/bdq#123 VALIDATION_CLASSIFICATION_CONSISTENT. Adding superfamily, tribe, subtribe as parameters. Adding support for checking these along with subfamily. Updating GBIF api to current version to obtain support for superfamily, tribe, subtribe, adding these to local NameUsage class. Updating GBIF name parser to current version, adding handling for new threading exception thrown from parse methods. Removed checked stub method.
…aining tests tdwg/bdq#70 VALIDATION_TAXON_UNAMBIGUOUS and tdwg/bdq#123 VALIDATION_CLASSIFICATION_CONSISTENT. Metadata, including source authority values, updated. Some cleanup of other comments, and consistency of comments in defaults class.
Added to the Notes "Note: that for this test to work, the lowest ranking element must be present and the higher ranking elements be consistent with it." Do we need to reword the Expected Response? This follows implementation tests by @chicoreus where: kingdom="Animalia"; is COMPLIANT, but kingdom="Animalia"; is NOT_COMPLIANT It was agreed through email discussion that this is what we want to happen. |
Expected Response changed (following ZOOM of 2023-08-29) and Specification Date updated "..... are consistent with the lowest ranking matched element in the bdq:sourceAuthority" And the last added part of the notes deleted. |
This test should have Data Quality Dimension "Consistency" rather than "Conformance". Edited. |
Perhaps not.. The test evaluates whether the higher classification is
consistent with a classification in a source authority, what is being
evaluated is conformance with the source authority in a way that is
internally consistent.
… This test should have Data Quality Dimension "Consistency" rather
than "Conformance". Edited.
|
Thanks @chicoreus. However, this would be the only Test with a Warning Type of "Inconsistent" that had a Data Quality Dimension of "Consistency". Given the one-to-one mappings of Data Quality Dimension to Warning Type suggest strongly for removal of Warning Type, this would be the one outlier. Retaining Warning Type under the circumstances would seem highly inefficient, at best. |
Splitting bdqffdq:Information Elements into "Information Elements ActedUpon" and "Information Elements Consulted". Also changed "Field" to "TestField", "Output Type" to "TestType" and updated "Specification Last Updated" |
…adding to unit test to confirm that 'consistent with the lowest ranking matched element' is handled as specified, and fixing some cases where superfamily from previous test case was passed forward to next.
The text was updated successfully, but these errors were encountered: