Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

taxonomy relationships #735

Closed
GoogleCodeExporter opened this issue Aug 25, 2015 · 11 comments
Closed

taxonomy relationships #735

GoogleCodeExporter opened this issue Aug 25, 2015 · 11 comments
Labels
Function-Taxonomy/Identification Priority-High (Needed for work) High because this is causing a delay in important collection work..

Comments

@GoogleCodeExporter
Copy link

Searching "everything taxonomy" does not perform very well, in part because we 
do not have reciprocal taxonomy relationships and so must perform an additional 
expensive join to fully consider relationships.

To implement reciprocity, 
http://arctos.database.museum/info/ctDocumentation.cfm?table=CTTAXON_RELATION 
needs to have a "reciprocal_relationship NOT NULL" column (or some functional 
equivalent) added and populated.

One potential (probably minor?) complication may be in ICBN vs. ICZN usage - 
Wikipedia says "synonym" (botany) and "junior synonym" (zoology) are the same 
thing, for example.

Example:

http://arctos.database.museum/name/Nemata-->synonym 
of-->http://arctos.database.museum/name/Nematoda
http://arctos.database.museum/name/Nematoda--->(no relationships)

would (automatically, from the revised code table) update to

http://arctos.database.museum/name/Nemata-->synonym 
of-->http://arctos.database.museum/name/Nematoda
http://arctos.database.museum/name/Nematoda--->{whatever the reciprocal of 
"synonym of" is}--->http://arctos.database.museum/name/Nemata

This would also solve 
https://groups.google.com/d/msg/arctos-ac/DfZe3kxADlY/RN7H3q3JSzEJ (formatting 
taxonomy relationships).

Ref: https://code.google.com/p/arctos/issues/detail?id=734

Original issue reported on code.google.com by dust...@gmail.com on 13 Jul 2015 at 8:54

@GoogleCodeExporter GoogleCodeExporter added auto-migrated Priority-High (Needed for work) High because this is causing a delay in important collection work.. labels Aug 25, 2015
@dustymc dustymc added this to the Needs Discussion milestone Aug 27, 2015
@dustymc
Copy link
Contributor

dustymc commented Apr 7, 2016

Possible improvement: Move ALL relationships to classifications, something like we get from GlobalNames. Example:

http://arctos.database.museum/name/Arhopalus%20cervinus#ITIS

taxon name: Arhopalus cervinus
species (root term in hierarchical terms, rank doesn't seem important): Arhopalus foveicollis
interpretation: "ITIS says Arhopalus foveicollis is favored over Arhopalus cervinus" (or something like that...)

This is "correct" from a data standpoint; our current data (http://arctos.database.museum/name/Echidna%20russellii) ~assert "Echidna (all uses) is a bad spelling of Bitis (vipers)," which isn't correct; Echidna remains a "good" name for eels and a "bad" synonym for some other stuff (pointy mammals, moths).

We would lose the precision available under http://arctos.database.museum/info/ctDocumentation.cfm?table=CTTAXON_TERM, BUT we know many of those data are garbage anyway (see email "backwards synonyms" @DerekSikes 1 Apr 2016), and many of them intentionally avoid precision (eg., DLM uses "synonym of" to mean "sameish thing" with no ICxN intentions); I see little evidence that we're capable of usefully maintaining those data, and see no way of determining what's trustworthy.

It is currently very easy to delete classifications, it should probably be more difficult/require confirmation/something to delete a "synonym bearing" classification.

It's not exactly clear how we'll avoid synonym-bearing classifications in things like updating FLAT; all code dealing with "the collection's classification" would need reviewed.

All "any taxa" queries would need rewritten, but performance should improve (we'd need to tune only one thing, albeit one very large thing).

Question (possibly for GlobalNames): Under eg http://arctos.database.museum/name/Echidna%20russellii#CatalogueofLife (and many other examples) the query was for a "species" (binomial) and various sources return a monomial (genus). What exactly is the assertion?

@sharpphyl
Copy link

Just curious if we've considering assigning a number to each use of a taxon name the same way we do to a locality. Then could specific numbers be in each classification (and search etc.) Would that keep them straight and link the correct ones? Numbers would be unique. Names aren't and adding the author doesn't seem to be a huge improvement overall.

Question (possibly for GlobalNames): Under eg http://arctos.database.museum/name/Echidna%20russellii#CatalogueofLife (and many other examples) the query was for a "species" (binomial) and various sources return a monomial (genus). What exactly is the assertion?

I find this happens frequently. If the species isn't found in these sources, they are returned only to the genus level. Also, if the species is invalid, WoRMS returns the valid species. Not sure if this will happen in WoRMS (via Arctos) or not.

@dustymc
Copy link
Contributor

dustymc commented Dec 10, 2018

assigning a number

We do - names have taxon_name_id and classifications have classification_id.

same way we do to a locality

...and just like localities, the ID isn't stable - they get replaced rather than updated when it's convenient, etc. Localities have 'locality_name' which IS stable - easy enough to add that to something like classifications, but (like locality_name) that would affect how the data may be managed.

@campmlc
Copy link

campmlc commented Dec 10, 2018 via email

@dustymc
Copy link
Contributor

dustymc commented Dec 10, 2018

taxon_name_id uniquely identified NAMES. Names also uniquely identify names - we have a unique index.

Classification_id uniquely identifies classifications. We replace those every time we clone-edit-delete instead of editing or use the classification bulkloader.

@campmlc
Copy link

campmlc commented Dec 10, 2018 via email

@dustymc
Copy link
Contributor

dustymc commented Dec 10, 2018

Sure - we just don't allow them to change. "Don't allow certain data to change" seems like a critical component of managing taxon concepts anyway. I don't think that's any sort of deal-breaker, but it's absolutely a big change in how we view and manage classification data.

We currently treat taxon names as "data" - eg, you can't change them once they're used. Classifications are treated like "metadata" - you can delete them or replace them (to make family consistent, or because it's easier than editing, or because someone left some junk behind, or whatever). Moving to taxon concepts - even if the "concept" is just name+name-author+year - would elevate classifications to "data" - they'd become things you pick (presumably for reasons) rather than things you inherit (eg, from collection preferences). Allowing you to pick specific "concepts" and allowing those concepts to arbitrarily change would be pointless, so we'd have to lock some things down. Keeping an identifier stable in that context should not be a problem.

@campmlc
Copy link

campmlc commented Dec 11, 2018 via email

@Jegelewicz
Copy link
Member

Agree. I have been wondering how the current model of "name as data and classification as metadata" came about. It seems like we are creating a lot of our own problems with the two layers of identification. What would we need to do to transition to such a model? and what am I missing about the current model that makes it more useful/appropriate?

@dustymc
Copy link
Contributor

dustymc commented Dec 11, 2018

what am I missing

normalization

What would we need to do to transition to such a model?

In that model (as I see it), normalization is even more critical. The only significant structural change would be identification_taxonomy.taxon_name_id becoming identification_taxonomy.classification_id. (That sort of modularity is another benefit of normalization.)

That should just leave the usability issues to deal with.

@Jegelewicz
Copy link
Member

Closing to consolidate issues see #1136

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Function-Taxonomy/Identification Priority-High (Needed for work) High because this is causing a delay in important collection work..
Projects
None yet
Development

No branches or pull requests

7 participants