taxonomy relationships #735

GoogleCodeExporter · 2015-08-25T00:41:08Z

Searching "everything taxonomy" does not perform very well, in part because we 
do not have reciprocal taxonomy relationships and so must perform an additional 
expensive join to fully consider relationships.

To implement reciprocity, 
http://arctos.database.museum/info/ctDocumentation.cfm?table=CTTAXON_RELATION 
needs to have a "reciprocal_relationship NOT NULL" column (or some functional 
equivalent) added and populated.

One potential (probably minor?) complication may be in ICBN vs. ICZN usage - 
Wikipedia says "synonym" (botany) and "junior synonym" (zoology) are the same 
thing, for example.

Example:

http://arctos.database.museum/name/Nemata-->synonym 
of-->http://arctos.database.museum/name/Nematoda
http://arctos.database.museum/name/Nematoda--->(no relationships)

would (automatically, from the revised code table) update to

http://arctos.database.museum/name/Nemata-->synonym 
of-->http://arctos.database.museum/name/Nematoda
http://arctos.database.museum/name/Nematoda--->{whatever the reciprocal of 
"synonym of" is}--->http://arctos.database.museum/name/Nemata

This would also solve 
https://groups.google.com/d/msg/arctos-ac/DfZe3kxADlY/RN7H3q3JSzEJ (formatting 
taxonomy relationships).

Ref: https://code.google.com/p/arctos/issues/detail?id=734

Original issue reported on code.google.com by dust...@gmail.com on 13 Jul 2015 at 8:54

The text was updated successfully, but these errors were encountered:

dustymc · 2016-04-07T15:02:02Z

Possible improvement: Move ALL relationships to classifications, something like we get from GlobalNames. Example:

http://arctos.database.museum/name/Arhopalus%20cervinus#ITIS

taxon name: Arhopalus cervinus
species (root term in hierarchical terms, rank doesn't seem important): Arhopalus foveicollis
interpretation: "ITIS says Arhopalus foveicollis is favored over Arhopalus cervinus" (or something like that...)

This is "correct" from a data standpoint; our current data (http://arctos.database.museum/name/Echidna%20russellii) ~assert "Echidna (all uses) is a bad spelling of Bitis (vipers)," which isn't correct; Echidna remains a "good" name for eels and a "bad" synonym for some other stuff (pointy mammals, moths).

We would lose the precision available under http://arctos.database.museum/info/ctDocumentation.cfm?table=CTTAXON_TERM, BUT we know many of those data are garbage anyway (see email "backwards synonyms" @DerekSikes 1 Apr 2016), and many of them intentionally avoid precision (eg., DLM uses "synonym of" to mean "sameish thing" with no ICxN intentions); I see little evidence that we're capable of usefully maintaining those data, and see no way of determining what's trustworthy.

It is currently very easy to delete classifications, it should probably be more difficult/require confirmation/something to delete a "synonym bearing" classification.

It's not exactly clear how we'll avoid synonym-bearing classifications in things like updating FLAT; all code dealing with "the collection's classification" would need reviewed.

All "any taxa" queries would need rewritten, but performance should improve (we'd need to tune only one thing, albeit one very large thing).

Question (possibly for GlobalNames): Under eg http://arctos.database.museum/name/Echidna%20russellii#CatalogueofLife (and many other examples) the query was for a "species" (binomial) and various sources return a monomial (genus). What exactly is the assertion?

sharpphyl · 2018-12-10T21:50:12Z

Just curious if we've considering assigning a number to each use of a taxon name the same way we do to a locality. Then could specific numbers be in each classification (and search etc.) Would that keep them straight and link the correct ones? Numbers would be unique. Names aren't and adding the author doesn't seem to be a huge improvement overall.

Question (possibly for GlobalNames): Under eg http://arctos.database.museum/name/Echidna%20russellii#CatalogueofLife (and many other examples) the query was for a "species" (binomial) and various sources return a monomial (genus). What exactly is the assertion?

I find this happens frequently. If the species isn't found in these sources, they are returned only to the genus level. Also, if the species is invalid, WoRMS returns the valid species. Not sure if this will happen in WoRMS (via Arctos) or not.

dustymc · 2018-12-10T21:56:18Z

assigning a number

We do - names have taxon_name_id and classifications have classification_id.

same way we do to a locality

...and just like localities, the ID isn't stable - they get replaced rather than updated when it's convenient, etc. Localities have 'locality_name' which IS stable - easy enough to add that to something like classifications, but (like locality_name) that would affect how the data may be managed.

campmlc · 2018-12-10T22:07:13Z

Taxon IDs sounds like a promising approach for dealing with the issue of linking a name to an authority, date, and classification.

…

On Mon, Dec 10, 2018 at 2:56 PM dustymc ***@***.***> wrote: assigning a number We do - names have taxon_name_id and classifications have classification_id. same way we do to a locality ...and just like localities, the ID isn't stable - they get replaced rather than updated when it's convenient, etc. Localities have 'locality_name' which IS stable - easy enough to add that to something like classifications, but (like locality_name) that would affect how the data may be managed. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#735 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AOH0hJ928oIJqebrpwiQ9S3w-cGswTtbks5u3tiDgaJpZM4ICJ2v> .

dustymc · 2018-12-10T22:24:34Z

taxon_name_id uniquely identified NAMES. Names also uniquely identify names - we have a unique index.

Classification_id uniquely identifies classifications. We replace those every time we clone-edit-delete instead of editing or use the classification bulkloader.

campmlc · 2018-12-10T22:44:43Z

Is it possible to have a stable classification (e.g. classification+taxon name) "name" or ID?

…

On Mon, Dec 10, 2018 at 3:24 PM dustymc ***@***.***> wrote: taxon_name_id uniquely identified NAMES. Names also uniquely identify names - we have a unique index. Classification_id uniquely identifies classifications. We replace those every time we clone-edit-delete instead of editing or use the classification bulkloader. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#735 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AOH0hC2UuY7FBBxUJEdIZDIFadTipLvqks5u3t8jgaJpZM4ICJ2v> .

dustymc · 2018-12-10T23:01:29Z

Sure - we just don't allow them to change. "Don't allow certain data to change" seems like a critical component of managing taxon concepts anyway. I don't think that's any sort of deal-breaker, but it's absolutely a big change in how we view and manage classification data.

We currently treat taxon names as "data" - eg, you can't change them once they're used. Classifications are treated like "metadata" - you can delete them or replace them (to make family consistent, or because it's easier than editing, or because someone left some junk behind, or whatever). Moving to taxon concepts - even if the "concept" is just name+name-author+year - would elevate classifications to "data" - they'd become things you pick (presumably for reasons) rather than things you inherit (eg, from collection preferences). Allowing you to pick specific "concepts" and allowing those concepts to arbitrarily change would be pointless, so we'd have to lock some things down. Keeping an identifier stable in that context should not be a problem.

campmlc · 2018-12-11T00:37:49Z

That sounds like a very promising approach to solving some of our issues with choosing particular name+classification combos for a given collection or specimen, and dealing with homonyms?

…

On Mon, Dec 10, 2018 at 4:01 PM dustymc ***@***.***> wrote: Sure - we just don't allow them to change. "Don't allow certain data to change" seems like a critical component of managing taxon concepts anyway. I don't think that's any sort of deal-breaker, but it's absolutely a big change in how we view and manage classification data. We currently treat taxon names as "data" - eg, you can't change them once they're used. Classifications are treated like "metadata" - you can delete them or replace them (to make family consistent, or because it's easier than editing, or because someone left some junk behind, or whatever). Moving to taxon concepts - even if the "concept" is just name+name-author+year - would elevate classifications to "data" - they'd become things you pick (presumably for reasons) rather than things you inherit (eg, from collection preferences). Allowing you to pick specific "concepts" and allowing those concepts to arbitrarily change would be pointless, so we'd have to lock some things down. Keeping an identifier stable in that context should not be a problem. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#735 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AOH0hFwY8T9aJllPUrzxpRosogpK06KPks5u3ufKgaJpZM4ICJ2v> .

Jegelewicz · 2018-12-11T17:47:37Z

Agree. I have been wondering how the current model of "name as data and classification as metadata" came about. It seems like we are creating a lot of our own problems with the two layers of identification. What would we need to do to transition to such a model? and what am I missing about the current model that makes it more useful/appropriate?

dustymc · 2018-12-11T18:11:01Z

what am I missing

normalization

What would we need to do to transition to such a model?

In that model (as I see it), normalization is even more critical. The only significant structural change would be identification_taxonomy.taxon_name_id becoming identification_taxonomy.classification_id. (That sort of modularity is another benefit of normalization.)

That should just leave the usability issues to deal with.

Jegelewicz · 2018-12-13T20:43:04Z

Closing to consolidate issues see #1136

GoogleCodeExporter added auto-migrated Priority-High (Needed for work) High because this is causing a delay in important collection work.. labels Aug 25, 2015

mkoo added enhancement and removed Type-Enhancement labels Aug 25, 2015

dustymc removed auto-migrated labels Aug 27, 2015

dustymc added this to the Needs Discussion milestone Aug 27, 2015

dustymc mentioned this issue Sep 22, 2015

prefer taxon name #756

Closed

ccicero added the Function-Taxonomy/Identification label Feb 20, 2016

dustymc mentioned this issue Jul 18, 2016

taxonomy - "invalid" names #912

Closed

dustymc mentioned this issue Oct 17, 2016

Delete Tridacna crocca #953

Closed

dustymc mentioned this issue Oct 28, 2016

default ID/taxonomy specimensearch #965

Closed

dustymc mentioned this issue Nov 8, 2016

magic taxonomy relationships #757

Closed

dustymc mentioned this issue Nov 18, 2016

taxonomy relationships #983

Closed

dustymc mentioned this issue Apr 3, 2017

taxon relationships (was: Delete Saxidomus nuttalli purpuratus) #1079

Closed

dustymc mentioned this issue Apr 11, 2017

Taxonomy: the big picture #1094

Closed

dustymc mentioned this issue Dec 10, 2018

worms refresh: test request #1841

Closed

Jegelewicz closed this as completed Dec 13, 2018

Jegelewicz mentioned this issue Dec 13, 2018

Taxon Concepts as a data model #1852

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

taxonomy relationships #735

taxonomy relationships #735

GoogleCodeExporter commented Aug 25, 2015

dustymc commented Apr 7, 2016

sharpphyl commented Dec 10, 2018

dustymc commented Dec 10, 2018

campmlc commented Dec 10, 2018 via email

dustymc commented Dec 10, 2018

campmlc commented Dec 10, 2018 via email

dustymc commented Dec 10, 2018

campmlc commented Dec 11, 2018 via email

Jegelewicz commented Dec 11, 2018

dustymc commented Dec 11, 2018

Jegelewicz commented Dec 13, 2018

taxonomy relationships #735

taxonomy relationships #735

Comments

GoogleCodeExporter commented Aug 25, 2015

dustymc commented Apr 7, 2016

sharpphyl commented Dec 10, 2018

dustymc commented Dec 10, 2018

campmlc commented Dec 10, 2018 via email

dustymc commented Dec 10, 2018

campmlc commented Dec 10, 2018 via email

dustymc commented Dec 10, 2018

campmlc commented Dec 11, 2018 via email

Jegelewicz commented Dec 11, 2018

dustymc commented Dec 11, 2018

Jegelewicz commented Dec 13, 2018