Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hirundo javanica Sparrman, 1789 is a bird, it shouldn't be in the Phylum Annelida #3250

Closed
gbif-portal opened this issue Feb 9, 2021 · 25 comments
Labels
backbone important An important, blocking issue

Comments

@gbif-portal
Copy link
Collaborator

gbif-portal commented Feb 9, 2021

Hirundo javanica Sparrman, 1789 is a bird, it shouldn't be in the Phylum Annelida

This is a feedback message that we received on Helpdesk. This publisher also hypothesised that we could be mismatching the genus to Hirudo https://www.gbif.org/species/6880797, an annelid.
Interestingly, other species such as Hirundo rustica (https://www.gbif.org/species/9515886), are correctly classified.


Github user: @ManonGros
User: See in registry
System: Safari 14.0.3 / Mac OS X 10.15.6
Referer: https://www.gbif.org/species/10332233
Window size: width 1248 - height 793
API log
Site log
System health at time of feedback: OPERATIONAL

@ManonGros
Copy link

@mdoering is this something that could be addressed in the next backbone?

@mdoering
Copy link
Member

mdoering commented Feb 9, 2021

interesting one. The problem here is that we only know the species from ZooBank, which does not provide much classification, just a placement into the genus Hirundo Catesby, 1771 : https://www.gbif.org/species/157314380

The species is also part of the IUCN RedList dataset, so I would propose to also add this list to the backbone sources?
@thomasstjerne @MattBlissett is this list used by the portal these days to display the redlist status? In that case it would be good to have it in the backbone anyways if it is well maintained.

mdoering added a commit to gbif/checklistbank that referenced this issue Feb 9, 2021
@mdoering
Copy link
Member

mdoering commented Feb 9, 2021

Hirundo Catesby, 1771 is a flying fish genus according to IRMNG. ZooBank does not classify it.

IRMNG is the reason Hirundo also gets added as a genus to Annelida. There are 3 Hirundo genera listed in IRMNG. @TonyRees maybe the Annelida can go? I will block it from being added to GBIF

@mdoering
Copy link
Member

mdoering commented Feb 9, 2021

There is also Hirundo D'Orbigny & Lafresnaye, 1838 from TaxREF which does not cause problems, but is a bad authorship of the Linnaeus bird genus. I will block that too.

@mdoering
Copy link
Member

mdoering commented Feb 9, 2021

I tried to investigate into Hirundo Catesby, 1771 and ZooBank lists Catesby, M. 1771. The natural history of Carolina, Florida and the Bahama Islands; containing the figures of birds, beasts, fishes, serpents. with their descriptions in English and French, etc. , London. Third edition. as the source:
https://www.biodiversitylibrary.org/item/219275#page/7/mode/1up

There I don't really find much, just the "mapping" to Linnean names, namely The flying Fish. Hirundo. Exocœtns evolans. L.

@deepreef Is this all it takes to describe a new genus?
Eschmeyers Catalogue lists it as unavailable:

Hirundo Catesby [M.] 1771:8 [The natural history of Carolina, Florida and the Bahama Islands; ref. 774] Fem. Not available, published in a rejected work on Official Index (Opinion 89, Opinion 259). Exocoetidae.

@TonyRees
Copy link

TonyRees commented Feb 9, 2021

Hi all, I did a bit of digging...

  1. In addition to Hirundo Linnaeus, 1758 in Aves (accepted) and Hirundo Catesby, 1771 in fishes (unaccepted), IRMNG had Hirundo (without authorship) assigned to "Hirudinoidea (awaiting allocation)" in Annelida with status = "uncertain". This name comes from Nomenclator Zoologicus, see http://ubio.org/NZ/search.php?search=Hirundo&colname=on&colauthority=on&colcomments=on&colcategory=on&colpublication=on&vol=&page= . The online version (uBio) has no further info except the year (1829) and place of publication, however the original scanned page says [error] "pro Hirudo 1758" (obviously Linnaeus' leech), so I have updated the IRMNG record to have authorship = Anonymous, 1829 and status = unaccepted, accepted name = Hirudo Linnaeus, 1758, parent now Piscicolidae in Annelida as per the accepted name (there are plenty of similar misspellings catalogues in Nomenclator Zoologicus, normally they are retained in IRMNG but with status = unaccepted, not uncertain; this was an oddity in that the Nom. Zoo. comment did not come through to the online (uBio) version).

  2. I do not normally remove misspelled genus names from IRMNG where they are sourced from "trusted providers" such as Nomenclator Zoologicus, since sometimes they may have associated species names that require to be resolved (in this case I do not think there are any); in any case it flags that there is a potential ambiguity when species names come in attached to that genus name, requiring additional scrutiny (in other words there might be a "Hirundo xxx" that is a fish, or an annelid, in addition to the bird). As a first pass alternative, a human or automated script could presume that all incoming binomials should be associated only with the accepted name (although in the case of cross Code homonyms there may be more than one of these), but it may give the incorrect result on occasion.

  3. A reasonable assumption would be that all "accepted" binomials could automatically be associated with the "accepted" genus name in the same Code (i.e. animals with animals, plants with plants, prokaryotes with prokaryotes). "Unaccepted" binomials would still then require manual intervention unless the incoming data supplier makes an unambiguous assertion as to which instance of generic name is the parent, in the case of homonyms existing (or one can make a presumption that a supplier of data on fishes will contain only fish names, etc., and homonyms in other groups can therefore be ignored).

  4. Previously mentioned in an email elsewhere, there is a "Hirudo Müller, 1788" supposedly in Nemertea, listed in Nomenclator Zoologicus and also in the ICZN Index where it is quoted as an invalid junior homonym of Hirudo Linnaeus in Annelida. In fact scrutiny of the original work (plus relevant comments in WoRMS on the included species, Hirudo grossa Müller, 1776) reveals that this is a later usage of Hirudo Linnaeus, not a junior homonym. The name has been removed from WoRMS but is retained in IRMNG at this time, status now changed to unaccepted, accepted name = Hirudo Linnaeus, 1758, parent changed to that of the accepted name.

  5. It seems that Hirundo javanica Sparrman, 1789 is indeed a bird; current name possibly Hirundo tahitica javanica Sparrman, 1789, see Avibase, https://avibase.bsc-eoc.org/species.jsp?avibaseid=466C01C287E53683

I think that answers the original question, if not let me know... The genus level changes mentioned above will be found in the next version of IRMNG, anticipated for next month i.e. March 2021 if all goes according to plan.

Cheers Tony

@TonyRees
Copy link

TonyRees commented Feb 9, 2021 via email

@TonyRees
Copy link

TonyRees commented Feb 9, 2021

In IRMNG, I have also made Catesby's Hirundo an unaccepted synonym of Exocoetus (noting it is also an unavailable name). Simply being unavailable does not exclude it from IRMNG scope, since this is the same with many names in Nomenclator Zoologicus, which are "in circulation" from that source, and can also be found in the scientific literature; also, particular published names can move into and out of availability via subsequent ICZN decisions... (available today, unavailable tomorrow, or sometimes vice versa).

Cheers - Tony

@deepreef
Copy link

deepreef commented Feb 9, 2021

@mdoering : you wrote:

The problem here is that we only know the species from ZooBank, which does not provide much classification, just a placement into the genus Hirundo Catesby, 1771 : https://www.gbif.org/species/157314380

However, in ZooBank the species is actually placed in Hirundo Linnaeus 1758:
http://zoobank.org/NomenclaturalActs/47798114-b7c1-4144-b38a-cdd6f700f001

Hirundo Catesby 1771 is a junior homonym as well as being on the Official Index; and not something that javanica Sparrman, 1789 is associated with.

I'm a little worried about how the link from javanica Sparrman, 1789 was made to Hirundo Catesby 1771 via ZooBank. Is there an error in the ZooBank export?

@TonyRees :

so I have updated the IRMNG record to have authorship = Anonymous, 1829 and status = unaccepted, accepted name = Hirudo Linnaeus, 1758

This seems more like it should be treated as a misspelling, rather than a distinct Genus-group name with authorship. I get that it's appearance in NZ gives it some credence as a distinct name, and perhaps justifies appending "Anonymous 1829". But I would think it should be investigated (any idea what "London Ency., 22, 783." refers to?) to see if it's simply a misspelling of Hirudo or subsequent use of one of the Hirundo (either one), before giving it renewed existence as a potentially distinct genus-group name that potentially competes in homonymy.

@TonyRees
Copy link

@deepreef : "Hirundo Anonymous, 1829" is indeed a simple misspelling (I vaguely recall checking the original on a previous occasion) and therefore unavailable. However there are numerous similar cases in Nomenclator Zoologicus, all indexed as published name instances, which I have imported into IRMNG as is at this time ... purging IRMNG of such unavailable names would be a big task (tens of thousands of names I think), also of questionable value given that such names are "in circulation" as misspellings currently from Neave, and are unlikely to disappear again any time soon; also that at least a subset of them are arguably useful for indexing / name recognition / reconciliation purposes.

A user would have to understand that they do not compete for homonymy under the Code, however the presence of "the same [published] name for different taxa" is not limited to just available / Code-compliant names.

Also as stated above, availability of a name can change through time through ICZN decisions, while the original publication details are a "fixed item"...

One could argue further for a "register" of available names only, but IRMNG (following Neave) is a superset of that, hopefully with known misspellings, suppressed names, and names not validly published flagged as such, and reconciled to their equivalent "accepted" instances where known (the latter being an ongoing process).

Happy to discuss further of course. Regards - Tony

Cheers - Tony

@deepreef
Copy link

Thanks, @TonyRees.

"Hirundo Anonymous, 1829" is indeed a simple misspelling (I vaguely recall checking the original on a previous occasion) and therefore unavailable.

Yeah, technically. But suppose I write the name "Fakusnamus" in this GitHub post. It's technically an unavailable name because it was published in a non-code-compliant way, I didn't fix a type species, and it's in a nonbinominal work (among many other reasons why Fakusnamus is an unavailable name). That doesn't mean you need to now need to track the genus Fakusnamus Pyle 2021 in IRMNG.

I FULLY support the need for IRMNG to track unavailable names (ZooBank does the same). But there still has to be some "bar" above which a name needs to kinda-sorta exist as a potentially available name before we go to the trouble of tracking it as an unavailable name. Granted, an appearance of a misspelled in NZ is closer to that bar than the appearance of an unavailable name in a GitHub blog post... but still, I wonder if that bar has been reached?

I would say that, in order for a name to cross the bar and become something we track forevermore as an unavailable name, it needs to at least have been proposed as a new name by someone. So... if in the context of Anonymous 1829 the name "Hirundo" was asserted to represent a new genus-group name, then yeah -- we should track it (in IRMNG, ZooBank, COL, GBIF, etc.). But if Anonymous was not intending to propose a new genus-group name, and merely misspelled "Hirudo" when intending to reference that previously existing name, then I don't think the bar of "track this unavailable name in our nomenclatural databases" has quite been reached.

So, by all means, DO NOT purge any unavailable names from IRMNG, especially if many/most of them have crossed the bar. I'm just saying that we should probably between "also misspelled as" instances from "proposed new names that fail to fulfill the requirement of the Code".

For what it's worth, if it was clear that Anonymous 1829 misspelled Hirudo as Hirundo, then I would create a TNU record for it in GNUB, spelled "Hirundo", and linked to the Protonym for Hirudo Linnaeus, 1758. That way we track it as a misspelling of a different name, rather than establish a new Protonym, credited to Anonymous 1829, branded as "unavailable". In other words, I would go with:

Hirudo Linnaeus, 1758 sec. Anonymous 1829 [spelled as "Hirundo"]

as opposed to:

Hiudo Anonymous 1829 sec. Anonymous 1829 [unavailable name, misspelling of Hirudo Linnaeus, 1758]

I suspect we're both probably doing exactly the same thing, and that the only actual difference here is what we mean by "unavailable". I don't think of every single misspelled name warrants a new "name" record attributed to the author who misspelled it, then rendered "unavailable".

Actually... if you have the full literature citation for "London Ency., 22, 783.", I can go ahead and create it in GNUB.

@TonyRees
Copy link

TonyRees commented Feb 10, 2021 via email

@deepreef
Copy link

Brilliant! Thank you Tony! I've spent the last hour trying to chase this down. Consulted my copies of both Neave and Sherborn databases, and wend down a wild goose chase on this:
https://www.biodiversitylibrary.org/bibliography/16001#/summary
And was just about to give up when I saw that you posted!

So, thanks to your efforts, I've gone ahead and created this:
http://zoobank.org/fb9e3da5-c3dd-49e4-b320-6b20008a609f

According to Neave, there are 22 names within this publication:
uid Category Name Authority Year Publication
17023 Mamm Arnee [Anon.] 1845 1845 London Ency., 22, 752.
29323 Mamm Caelogonus Anonymous 1845 1845 London Ency., 22 (Zoology), 747.
30657 Mamm Callitriche Anonymous 1845 1845 London Ency., 22, 736.
54810 Mamm Dasurus [Anon.] 1829 1829 London Encycl., 22, 743.
55633 Mamm Delphimaptera [Anon.] 1845 1845 London Ency., 22, 853.
62607 Mamm Draximenus Anonymous 1829 1829 London Ency., 22, 744.
64157 Mamm Echemys [Anon.] 1829 1829 London Ency., 22, 745.
89592 Mamm Hetamys [? Author] 1829 1829 London Ency., 22, 746.
91528 Verm (Hirud.). Hirundo [? Author] 1829 1829 London Ency., 22, 783.
107331 Verm (Oligoch.). Limbrus [Anon.] 1829 1829 London Ency., 22, 783.
109774 Mamm Loncherites Anonymous 1845 1845 London Ency., 22 (Zool.), 745.
124474 Mamm Moschatus Anonymous 1845 1845 London Ency., 22, 752.
125261 Mamm Myctonome Anonymous 1845 1845 London Ency., 22 (Zool.), 738.
125702 Mamm Myorthius Lay 1845 1845 in Wilkes, London Ency., 22, 743.
125740 Mamm Myotes [? Author] 1829 1829 London Encyc., 22, 735.
127085 Mamm (Primates). Nasica Anonymous 1845 1845 London Ency., 22 (Art. Zool.), 734.
139388 Mamm Otaclinus Anonymous 1845 1845 London Ency., 22 (Zool.), 736.
139417 Mamm Oterites Anonymous 1845 1845 London Ency., 22 (Zool.), 742.
189740 Mamm Sterops [Anon.] 1845 1845 London Ency., 22, 736.
195387 Mamm Tatou [Anonymous] 1845 1845 London Ency., 22, 748.
208429 Mamm [Anonymous] Yak 1845 1845 London Ency., 22, 752.
209070 Mamm Zebu Anonymous 1845 1845 London Ency., 22, 752.

Some are listed as 1829, and some as 1845, but all are in Volume 22. I checked out Sherborn and at least some of the names are listed in both 1829 and 1845 editions. I spot checcked a few and most seem to be misspellings.

Anyway, I think you raise some good points RE Neave (and presumably Sherborn as well?) automatically count as "above the bar". But at least in Sherborn's case, he seems to acknowledge them as misspellings.

In either case, I think we agree that these text strings should be tracked. In my mind, though, they don't warrant establishing new Protonyms, just subsequent TNUs (like the one linked above).

MANY thanks for sharing this information! I think we pretty much agree on the important bits.

@TonyRees
Copy link

TonyRees commented Feb 10, 2021 via email

@mdoering
Copy link
Member

However, in ZooBank the species is actually placed in Hirundo Linnaeus 1758:
http://zoobank.org/NomenclaturalActs/47798114-b7c1-4144-b38a-cdd6f700f001

Hirundo Catesby 1771 is a junior homonym as well as being on the Official Index; and not something that javanica Sparrman, 1789 is associated with.

I'm a little worried about how the link from javanica Sparrman, 1789 was made to Hirundo Catesby 1771 via ZooBank. Is there an error in the ZooBank export?

There seems something wrong with the ZooBank DwCa indeed. Or at least how we interpret it.
H. javanica in GBIF is flagged with ParentNameUsageID invalid, that is quite worrying and will probably explain the wrong classification. The verbatim data looks rather correct to me:

taxonID | 47798114-b7c1-4144-b38a-cdd6f700f001
parentNameUsage | Hirundo Linnæus, 1758
parentNameUsageID | 779afca1-058e-4cb5-a7b5-4d7b15980a94
originalNameUsageID | 47798114-b7c1-4144-b38a-cdd6f700f001
originalNameUsage | Hirundo javanica Sparrman, 1789
acceptedNameUsageID | 47798114-b7c1-4144-b38a-cdd6f700f001
acceptedNameUsage | Hirundo javanica Sparrman, 1789 sec. Sparrman
scientificName | Hirundo javanica Sparrman, 1789
scientificNameAuthorship | Sparrman 1789

Nothing to worry about, but surprisingly scientificNameAuthorship has no comma before the author while scientificName does.

In general it is great of readability to have both the id terms and the literal values (e.g. parentNameUsageID & parentNameUsage). But for the interpretation of data this can lead to problems as establishing links based on the literal values can be difficult if there are homonyms in the data. It is best to have the ID based relationships. The GBIF and COL importer therefore prefers these, but falls back to resolving literal values which it must have done in this case.

I would like to see the ZooBank data in COL ChecklistBank which uses newer code. Unfortunately the IPT dwca is not accessible: http://zoobank.org:8080/ipt/archive.do?r=zoobank

Screenshot 2021-02-10 at 10 42 27

@mdoering
Copy link
Member

@deepreef @ahahn-gbif The last working copy of ZooBank we have in GBIF is from 2019-11-18, since 2020 it did not work!

@mdoering
Copy link
Member

I have kicked off an import of zoobank 2019 into COL CLB just now

@mdoering
Copy link
Member

mdoering commented Feb 10, 2021

@deepreef it looks fine in COL CLB: https://data.catalogueoflife.org/dataset/2037/taxon/47798114-b7c1-4144-b38a-cdd6f700f001

Well, the verbatim accordingTo rendering is maybe a bit too much and we should abbreviate the full reference, but the data is interpreted right and Hirundo javanica is linked to Hirundo Linnæus and higher up to birds!

BUT the parent ID still does not resolve - it is simply missing in the 2019 archive and as you can see in the verbatim section
parent id invalid is still flagged:

https://data.catalogueoflife.org/dataset/2037/taxon/47798114-b7c1-4144-b38a-cdd6f700f001

@deepreef
Copy link

@mdoering : First, yes the ZooBank IPT has been down ever since our server system suffered a ransomware attack in late 2019. It took nearly a year to fully recover from that, and we are also in the midst of a major server/SAN upgrade at the moment. I actually set up an IPT server last year, but for various reasons I've not had time to get the ZooBank IPT resource back online. This past year has been dominated in my world by "squeaky wheel gets the oil", so I will consider this the "squeaky wheel" for ZooBank IPT, and will have it up by next week (I could do it today, but too many other deadlines loom this week).

Second: Ah! OK, I think I now understand the problem! When I first set this up, you had requested that I filter the output on ZooBank IPT to only content that has ZooBank LSIDs. However, that means that MANY (most?) parentNameUsageID values will not be included! Why? Because the parentNameUsageID of is the usage (TNU) of the species Hirundo javanica Sparrman, 1789 is the genus Hirundo, and that usage is Hirundo Linnæus, 1758 sec. Sparrman 1789:
http://zoobank.org/779afca1-058e-4cb5-a7b5-4d7b15980a94

This is NOT a nomenclatural act (only a subsequent usage of the genus Hirundo Linnæus, 1758 by Sparrman 1789), so it does not get a ZooBank LSID, and it does not show up in the IPT output for the ZooBank IPT. In fact, the only parentNameUsageID values that will resolve within the ZooBank IPT output are cases where both the genus and the species are originally described in the same publication.

There is a logical flaw in the way that the ZooBank IPT data are formatted, namely, the combined values of:

parentNameUsage | Hirundo Linnæus, 1758
parentNameUsageID | 779afca1-058e-4cb5-a7b5-4d7b15980a94

This is incorrect, because the parentNameUsage value corresponding to that parentNameUsageID value shoudl actually be "Hirundo Linnæus, 1758 sec. Sparrman 1789". I don't remember why I truncated the "sec. Sparrman 1789", but I'm sure there was a reason. But the point is that this is misleading because the parentNameUsage value implies "Hirundo Linnæus, 1758 sec. Linnæus, 1758"

There are two basic ways we can solve this. The first is that I can "dumb down" or "short circuit" the ZooBank IPT output and instead of representing the parentNameUsageID for Hirundo javanica Sparrman, 1789 as 779afca1-058e-4cb5-a7b5-4d7b15980a94 (Hirundo Linnæus, 1758 sec. Sparrman 1789), I could instead represent it as 1eee2eaf-20ac-49fb-85a7-bd293861402b (Hirundo Linnæus, 1758 [sec. Linnæus, 1758]: http://zoobank.org/1eee2eaf-20ac-49fb-85a7-bd293861402b). That way, all parentNameUsageID values would point to other records within the ZooBank (senus stricto) dataset output. This would be the easiest fix to the problem. But it would also be the wrong one.

I say wrong, because this would break the intended definition of parentNameUsageID. NameUsages are not "Names", they are usages of names.

So the right thing to do, I think, would be to replace the "ZooBank" IPT with the "GNUB" IPT. In other words, when I set the IPT back up again, I should not artificially filter the content down to just the ZooBank subset, but rather should export the entire dataset, including the non-nomenclatural act usage instances.

There are a LOT of implications of this -- too much for a GitHub post or even email. I think the best thing to do is plan a Zoom call to discuss.

@TonyRees
Copy link

TonyRees commented Feb 10, 2021

Back on the original starter to this thread - it is an interesting trail how a swallow got flagged as an annelid - including first, a typo in "The London Encyclopaedia" of 1829 (that we can see via the good services of Google Books), the fact that the authors of Neave decided that this was worth indexing, the digitization by uBio that missed the comment in the printed version of Neave that this was an error, the latter's incorporation into IRMNG without additional scrutiny and its passing to GBIF (I believe that was the route), and then the mis-association of the species in question with that particular "name" instance. So many decades and centuries traversed, and the "butterfly effect" of otherwise totally trivial errors! (hopefully further adjusted to prevent repetition in the future of course).

Tony

@deepreef
Copy link

@TonyRees : I TOTALLY agree! I would not have spent >90 mins researching it myself last night if it wasn't such an interesting case! We should keep it as an example of how complex these things can be. I'll be on the lookout for other similar examples. There are a bunch of cases where misspellings of one name are homonyms of another, and these examples help us define the nature of the data objects we hope to track!

@TonyRees
Copy link

@deepreef I like the term "collisions" for erroneous spellings which then accidentally match a correctly spelled name (they are not "real" homonyms since misspellings are unavailable names). Just in case you care to use it somewhere in the future!

@deepreef
Copy link

I like "collisions" too! Another term that Dave Remsen has used is "homograph", to distinguish from "homonym".

@mdoering
Copy link
Member

There are a LOT of implications of this -- too much for a GitHub post or even email. I think the best thing to do is plan a Zoom call to discuss.

Sounds good! Having the entire GNUB should be useful, but I always hoped we can have also just a ZooBank only export as one would have to deal a lot more with concepts and many versions of the same "name" when using GNUB.

@deepreef
Copy link

Understood! But with the full GNUB dataset, it's very easy to filter on just the Nomenclatural Act records where taxonID=originalNameUsageID (among other filter parameters). We could explore a dataset that includes only these record, plus the directly-referenced records from these records, so that it would be internally complete, but limited to only those records with direct relevance to nomenclatural acts.

I've been thinking on this, and will explore some ideas to see what the effects are on an output dataset.

@mdoering mdoering added the important An important, blocking issue label Feb 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backbone important An important, blocking issue
Projects
None yet
Development

No branches or pull requests

5 participants