Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taxa without classification - follow-up to #1761 and #1936 #2098

Closed
sharpphyl opened this issue May 28, 2019 · 16 comments
Closed

Taxa without classification - follow-up to #1761 and #1936 #2098

sharpphyl opened this issue May 28, 2019 · 16 comments
Labels
Function-Taxonomy/Identification Priority-Low (Wish list) I don't want to forget this, but it doesn't need to be done immediately
Milestone

Comments

@sharpphyl
Copy link

Issue #1761 hasn't been updated recently and #1936 has been closed, so it may be time to start a new and possibly smaller issue.

In #1761 I noted that I had run an arbitrary 999 of the taxon names without classification against WoRMS and found 672 of them had an aphiaID. Today I looked at four of them and found I only needed to refresh the aphiaID to get the complete classification in WoRMS (via Arctos). None of these had an Arctos classification though they did have an Arctos Relationship classification. So, as requested in #1936 we'd like to clone the WoRMS (via Arctos) into an Arctos classification.

Dusty, can you refresh everything in WoRMs (via Arctos) that needs it rather than me manually doing these 672 taxon names? Same for creating the Arctos classification.

Maybe this will make some dent in the taxa without classifications. Thanks.

(https://github.com/ArctosDB/arctos/files/3228219/WoRMS.match.of.999.arbitrary.taxa.without.kingdom.xlsx)

@dustymc
Copy link
Contributor

dustymc commented May 28, 2019

We could definitely use some consolidation/cleanup.

#1761 looks like it can probably be closed. We've got the easy stuff, I don't think anyone will (or should) to go look up kingdom for 140K names. That's out of date anyway - #1641 (comment) made another 290,861 'bare' names....

If #1936 was not a one-time thing, then it needs to be a new issue. And FWIW I'd probably oppose simply replicating data across Source borders - we did what we could for existing specimens, someone intending to cataloged stuff in one source can bring data over from another (or request that be done), and continually flinging garbage around - #2074 (comment) - without some solid reason to do so just doesn't seem like a good idea to me.

Screen Shot 2019-05-28 at 8 24 03 AM

is actively looking for the gaps that matter - those with specimens.

What exactly are you wanting me to do with the attached? And please resend as CSV - I'm not anxious to find new ways Excel can mangle data....

@dustymc dustymc added this to the Need More Information milestone May 28, 2019
@sharpphyl
Copy link
Author

Specifically, could you refresh the taxon names on the csv. They have aphiaIDs and on the few that I checked, they just needed to be refreshed. They were on a list of taxa without classification. There may be a lot more. I only checked 1000 (the WoRMS limit for a match) and found these. Is there anyway to check for WoRMS (via Arctos) taxa that need refreshing?

Also in the Dashboard, issue #1894, how would I adjust the SQL to get all the WoRMS (via Arctos) names that lack a higher classification regardless of whether or not DMNS:Inv is using them. That might turn up more that need refreshing.

WoRMS (via Arctos) taxa that need refresh.csv.zip

@dustymc
Copy link
Contributor

dustymc commented May 29, 2019

I'm still lost. A bunch of the aphiaids don't exist - there's nothing to refresh. Where did these data come from??

taxa that need refreshing?

I'm not sure what this means. There were obviously a few problems with the initial import, or maybe some stuff changed in WoRMS before we were monitoring. Now changes in WoRMS should just appear in Arctos.

WoRMS (via Arctos) names that lack a higher classification

That's not really possible - if they're "WoRMS (via Arctos)" then they have a classification. If you mean some specific missing rank or something I can get that, but I'm not sure if that would be useful??

@sharpphyl
Copy link
Author

These came out of a file from months ago that were taxa without a kingdom. I took 999 of them and ran them against the WoRMS match and got matches on 670 or so of them - not all perfect matches but something. So today I started to look at them in WoRMS (via Arctos). For example:

Example #1 - Row 634 - Acalia erythraea

Acalia erythraea | 368961 | exact | 378679 | 9718 | alternate representation | Acalia erythraea | Linckia (Acalia) erythraea | Animalia | Echinodermata | Asteroidea | Valvatida | Ophidiasteridae | Acalia |   | erythraea |  

In WoRMS via Arctos, this is what I find for Acalia erythraea

Screen Shot 2019-05-29 at 2 18 30 PM

So there's no classification right now, but if I refresh I get this

Screen Shot 2019-05-29 at 2 23 57 PM

Example #2 - Row 647 Craticula submolesta

Craticula submolesta | 617377 | exact | 661379 | 44002 | alternate representation | Craticula submolesta | Navicula submolesta | Chromista | Ochrophyta | Bacillariophyceae | Naviculales | Stauroneidaceae | Craticula |   | submolesta |  

This one doesn't have a WoRMS (via Arctos)or an Arctos entry but it does have a World Registry of Marine Species entry, so I'm not sure why we didn't get it.

Screen Shot 2019-05-29 at 2 54 53 PM

Example #3 - Row 487 Satiellina jamairiensis

Satiellina jamairiensis | 795271 | exact | 795271 | 0 | accepted | Satiellina jamairiensis | Satiellina jamairiensis | Animalia | Arthropoda | Ostracoda | Palaeocopida | Satiellina |   | jamairiensis |  

As I found it

Screen Shot 2019-05-29 at 2 38 48 PM

After refreshing

Screen Shot 2019-05-29 at 2 39 50 PM

Does that help? Rather than manually going through these 600+ taxa, I'm wondering if you can refresh them.

A bunch of the aphiaids don't exist - there's nothing to refresh. Where did these data come from??

Can you tell me which ones don't exist? I've only checked about five so far. It would seem that they should all get refreshed with your regular updates but somehow they aren't and perhaps others that show as "no kingdom" have a similar problem.

I'm not using any of these; just trying to keep WoRMS (via Arctos) complete and tidy.

@sharpphyl
Copy link
Author

Dusty, While we're working on the WoRMS Source, is the subgenus the reason this entry wasn't uploaded to WoRMS (via Arctos)? I just added it and refreshed via the aphiaID but it wasn't there before.

https://arctos.database.museum/name/Schistoloma%20alta#WoRMSviaArctos

It's invalid so I won't use it but wanted to link it to the valid species.

@dustymc
Copy link
Contributor

dustymc commented Jun 4, 2019

#1704

@sharpphyl
Copy link
Author

Yes, I remember that we can't have it both with and without the subgenus as they are the same species. I've deleted the subgenus row and the subgenus from the species name _Schistoloma _alta , but won't a refresh put it back in again since it doesn't exist in WoRMS without the subgenus? And it didn't download at all (without without the subgenus) which I think means that everything in WoRMS that exists only with a subgenus isn't in WoRMS (via Arctos). Correct?

This particular one is invalid but without it, I didn't have the usual link to the preferred name. In general, what should I do.
Not add any taxon name from WoRMS with a subgenus?
Add it, but delete the subgenus row and delete the subgenus from the species?
Add the aphidID or not add the aphidID so it doesn't refresh?

Do any of these approaches mess up the taxon record and the taxon name search functionality?

@dustymc
Copy link
Contributor

dustymc commented Jun 4, 2019

deleted the subgenus row

That's not necessary - I don't (much) care what's in CLASSIFICATIONS, I just want clean names.

I refreshed, it does in fact bring the subgenus back in. You can just remove the aphiaia (=remove the link to WoMS) if you don't want that - otherwise someone changing something on worms will cause a refresh in Arctos.

We didn't automagically find names with subgenera - there's no namestring to share, that's one of the many problems with 'traditional' taxonomy.

Once the aphiaid is in Arctos there's a link and it will pull from it no matter what's on the other end - you could update https://arctos.database.museum/name/Schistoloma%20alta#WoRMSviaArctos to use http://www.marinespecies.org/aphia.php?p=taxdetails&id=255011 if you want (but please don't!).

You can't mess up the "any taxon" search - it hits the name itself (why I need clean names), and the worms classification (now) contains Schistoloma (Schistoloma) alta so that'll find specimens as well. You can definitely make your data (the 'family' and etc. search fields) inaccessible by providing inconsistent data, but those are more or less inconsistent by definition, and this one doesn't seem like something you'd use in an accepted ID anyway.

@sharpphyl
Copy link
Author

Back to the original purpose of this issue. There are still taxa in WoRMS (via Arctos) without a classification. It would seem that one advantage of using that Source would be that everything would have a classification unless it's a (non-WoRMS) taxon that I manually added without a classification.

How would I modify the SQL in Issue #1894 to get every taxon name in WoRMS (via Arctos) that is lacking higher classification or can you run that list? I've cleaned up everything that DMNS:Inv is using, but I have another list (I think culled from the list of all taxa missing higher classification) and a lot of them still need to be refreshed. I don't understand the problem since I think your system refreshes continuously. Today, I already refreshed Lanistes olivaceus var. ambiguus, Incertipoma virile, Incertipoma subglobosum. I did have to refresh more than once to get the accurate entry. At first, it put in the accepted name rather than the taxon name but after three refreshes it shows the unaccepted name plus the preferred name. Does it matter that they aren't accepted names?

Screen Shot 2019-06-10 at 7 27 51 AM

Screen Shot 2019-06-10 at 7 46 29 AM

Also, some of them don't have a WoRMS (via Arctos) entry even though there is a World Register of Marine Species entry. I added a entry for _Pseudomalaxis roddai and Eutudorops. Both were unaccepted. Does that matter? By adding them, I now have the linked to the preferred taxa too.

Screen Shot 2019-06-10 at 7 29 07 AM

After refresh

Screen Shot 2019-06-10 at 7 29 50 AM

Screen Shot 2019-06-10 at 7 53 16 AM

As examples, here are five entries that I left with the following problems:

(unaccepted) Bellerophina minuta | aphiaID 747778 | needs refresh |accepted as 584856  
(unaccepted) Meridolum ascensum | aphiaID 1298481 | needs refresh | accepted as 1298479  
(accepted) Marshallena neozelanica | aphiaID 831440 | needs refresh
(accepted) Satsuma meridionalis | aphiaID 1316504 | needs refresh
(accepted) Parastroma trechmanni | aphiaID 992161 | in WoRMS but no WoRMS (via Arctos) entry

@sharpphyl sharpphyl added Bug Arctos is not performing as it should. Function-Taxonomy/Identification labels Jun 17, 2019
@dustymc
Copy link
Contributor

dustymc commented Jun 17, 2019

taxon name in WoRMS (via Arctos) that is lacking higher classification

"Taxa in WoRMS (via Arctos) without a classification" isn't possible - the 'WoRMS (via Arctos)' bit IS a classification. I can help with this, but I don't know what you mean.

need to be refreshed. I don't understand the problem since I think your system refreshes continuously.

No, I refresh when WoRMS tells me they've changed something. Maybe we should refresh everything and see what that does.

I did have to refresh more than once to get the accurate entry.

Is there any possibility your browser is caching (or haunted, or ...) and is causing these problems? That does not resemble anything I've seen from WoRMS, I can't imagine how it could be possible from the Arctos side, but it sounds a LOT Chrome hanging on to data that it just figures is close enough to what you're looking for.

Does it matter that they aren't accepted names?

Absolutely never to me - names are names, 'accepted' is just another bit of metadata.

By adding them, I now have the linked to the preferred taxa too.

Yep, and

preferred_name: Ramsdenia

should still get data entry people where they need to be if we add 500 more relationships (so the actual links become muddied).

examples

I clicked the first couple - it just worked, like it always does.

no WoRMS (via Arctos) entry

If it somehow fell through the initial import cracks, it's only going to find its way to Arctos if it changes in WoRMS.

I think we also have access to their periodic dump, which we might use to find some of this.

Some prioritization would be very helpful to me. I think this is all theoretical problems at this point. Solving them could turn into a full-time job. Unless directed otherwise, I'm probably going to wander back towards locality-land as soon as I can....

@sharpphyl
Copy link
Author

No, I refresh when WoRMS tells me they've changed something. Maybe we should refresh everything and see what that does.

Yes, I think we need to refresh everything and see if that adds the higher classification to these names. I switched from Chrome to Firefox and went to the next one on my list: Conus minimus var. condoriana and found the name without any higher classification in WoRMS (via Arctos).

Screen Shot 2019-06-17 at 6 54 41 PM

After refresh, it's all there.

Screen Shot 2019-06-17 at 6 55 37 PM

Wouldn't this taxon name have been included in the list of names without a kingdom? Also, if we (or any user) were to enter a new specimen and use this taxon and Source, again, there would be no family, kingdom, etc. attached to to the name.

So, yes, can you refresh all the WoRMS (via Arctos) taxa (without overwhelming the rest of the Arctos users) and let's see if that resolves this issue. It's Medium priority. Right now, I've cleaned up every name we're using, but I have no idea what taxon name we'll need to use tomorrow.

@Jegelewicz
Copy link
Member

Can we close this?

@sharpphyl
Copy link
Author

Unfortunately, there still seem to be WoRMS (via Arctos) taxa that need to be refreshed. I opened the next mollusca taxa on the list this morning to see if the problem had been resolved..

Screen Shot 2019-09-16 at 1 33 45 PM

After refresh

Screen Shot 2019-09-16 at 1 34 25 PM

So somehow we're getting taxon names in WoRMS (via Arctos) and the aphiaID but we're not getting the classification.

Here are the next names on my list if you want to see the problem. I have not refreshed them. They are all invalid but Dusty said that's just a bit of metadata and doesn't impact the refresh process. Also, all of them (and the others that I refreshed today) have an Arctos Relationship entry that shows the valid (preferred) term and no Arctos entry, if that matters.

Paludestrina olssoni
Paludestrina nanna
Paludestrina curva
Paludestrina milium
Paludestrina plana
Paludestrina cingulata
Paludestrina turricula

And it's not just mollusca. See, for example: Styllaria borealis and Diatoma fasciatum.

As you'll note from the initial entries, all I did was check 1000 taxon names from our "no higher classification" list to be sure none of them had WoRMS entries and when I found WoRMS aphiaIDs I started to open them and found the unrefreshed entries.

So there seems to be a bug somewhere and, while it hasn't impacted me yet, I'm sure we want to squash it.

@dustymc
Copy link
Contributor

dustymc commented Sep 16, 2019

I don't think this is a bug, I think it's just weird legacy of the initial import. If something changes at worms it'll auto-refresh, and anyone can manually refresh at any time.

I'm still up for refreshing everything, either once or on some schedule, but that needs more attention than I feel like I can give it at the moment, especially if it's not breaking anything important.

@sharpphyl
Copy link
Author

A refresh can certainly wait as it hasn't impacted us so far. But it may be contributing to the count of taxon names without classifications, so we should be aware of it.

@dustymc dustymc added the Priority-Low (Wish list) I don't want to forget this, but it doesn't need to be done immediately label Sep 18, 2019
@dustymc dustymc removed the Bug Arctos is not performing as it should. label Jul 27, 2020
@dustymc
Copy link
Contributor

dustymc commented Oct 4, 2021

Handled by #3512 and #1894

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Function-Taxonomy/Identification Priority-Low (Wish list) I don't want to forget this, but it doesn't need to be done immediately
Projects
None yet
Development

No branches or pull requests

3 participants