-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LSIDs for taxonomic names live again #117
Comments
This is marvelous, @rdmpage ! Do you have any plans to add ZooBank to the list of supported sources? Perhaps there is already a resolver for it somewhere else, but if so, it isn't apparent from the ZooBank website. |
@baskaufs Glad you like it! By default I'm concentrating on sources that have RDF XML currently (or recently) available. I'm also biased towards integer identifiers (makes storing the data in chunks a bit easier, the whole thing data and all is in GitHub https://github.com/rdmpage/lsid-cache). ZooBank stopped resolving LSIDs a long time ago :( If @deepreef restores that feature (even if just the RDF XML output) I could add ZooBank to the list, alternatively I'd have to make my own mapping between the JSON currently served by ZooBank and the TDWG LSID vocabulary, which is possible but slightly undermines the notion that I'm caching authoritative LSID metadata. Personally I'm still baffled how our community decided that (a) the best identifier for a taxonomic name is an LSID and yet (b) made no attempt to persist either the identifiers or their associated metadata... |
FYI I've managed to find a copy of a ZooBank LSID record in XML:
The current ZooBank JSON API returns this for A1AE7A00-32C6-4510-A1D6-6DDDA9129D8B:
So, the mapping is less than straightforward :( |
Thanks for tagging me on this! Like @rdmpage , I have been bothered by the state of LSIDs -- but from the opposite direction. I am bothered that we still mint them as though the community uses them (in the way they were intended to be used). The reason I haven't bothered to maintain the LSID resolver for ZooBank is that there was basically only one person who ever accessed them using the LSID resolution protocol (hint: it's the same person who started this thread). I'm more than happy to get it working again, if there is some desire to actually use that protocol for resolving content. @rdmpage : you used to have a WONDERFUL LSID resolution testing service -- is that still functional? If you can point me to that service (which I'll need to test the ZooBank LSID resolver service), I'll get the ZooBank LSID resolver working again. The last time we discussed this, we ended up agreeing to abandon the LSID protocol, and instead create an RSS feed: Snarky commentary aside, this is actually PERFECT timing. I had a long chat this morning with the COL ISG and one of the key topics was mobilizing ZooBank and GNUB to be more tightly integrated with COL/GBIF ChecklistBank. We had done a lot of work on that before, which stalled on November 18 2019 when our server system was hit by ransomware. After we solved that issue, we found ourselves in the middle of a global pandemic and re-adjusted priorities. As it happens, the cycle of priorities have looped around again such that ZooBank is back near the top. The key next steps is to get these two datasets live again: IPT is already up and running on our server for both, and the last step needed to flip the switch is to find a moment for @mdoering and I to connect and hack the config file and make them live again (perhaps next week). ZooBank was last refreshed on the day of the ransomware attack, and GNUB has been down since May 2015 (the outcry from that was the same as the RSS feed going down). Both should be up and live again by next week. So, after getting the RSS feed and the two IPT datasets up and running again, this is my question to @rdmpage and @baskaufs and anyone else who is interested: What would you like next?
Tell me what you want, and I'll get 'er done. I don't even mind doing it for a client base of one (or two or three) -- I just want to make it easy to access and use the content. The reason the JASON API looks so clunky is that it never got past the "proof of concept" phase. If you give me the exact JSON output template you want, I can have that up and running for you. Of course, if I change the existing API, it will break any code that was build around the existing structure. But I have no doubt what the outcry to that will be (more deafening silence), so I'm ready to completely change the output template of all of these data access services (IPT, RSS, JSON API, and even LSID -- if people really want that) so they provide exactly the same content. Let's do this. |
By the way, another reason this is perfect timing is that we're planning the next-generation ZooBank in the context of the 5th Edition of the ICZN Code. One of the items on that list of improvements was to (once and for all) abandon the LSID protocol for identifiers. We're still committed to maintaining the ones we've already minted into perpetuity (at least as identifiers; if not as an LSID resolution service); but after a certain date in the year 202X, the plan is to only issue the UUIDs. This approach was based on the assumption that LSIDs were dead in our community. But based on this thread, I'm wondering if news of the demise of LSIDs has been greatly exaggerated... Do we want to recommit to them? Or should we drive the wooden stake through it's heart once and for all and embrace something else (my preference: UUIDs for everything, wrapped within the DOI dereferencing infrastructure). |
One final thing:
You and me both! We all got together for two different workshops to discuss it. @rdmpage gave a great presentation on why LSIDs suck, DOIs suck and PURLs suck (I think those were the three -- I just remember that they all definitely sucked). At the end of those workshops, we all decided that LSIDs sucked the least, so we decided to go for it (largely because they were developed and backed by "IBM" -- so clearly were going to be around for the long haul -- yeah, right...). Lee Belbin convinced Paul Kirk, Nicky Nicholson and I to embrace LSIDs as a way to kick-start the community interest and understanding in them by showcasing them in IF, IPNI and ZooBank (respectively). COL wasn't far behind in adopting them. It all seemed so promising at the time. Sigh |
@deepreef Hi Rich, from my perspective it would be great to have the LSID XML available, even if just via an API call rather than full blown LSID resolution. That way I could cache it and have essentially instantaneous access to the four main LSID providers. The LSID tester you mention is long dead, but some of its code lives on in http://www.lsid.info/resolver/ which could be used to help debug LSIDs. |
On the other things it seems to me inevitable that any serious attempt to issue identifiers for taxonomic names should use DOIs. I have never liked UUIDs, I think they are anti-user and send exactly the wrong message if you want to encourage adoption (identifiers are ugly, for computers only, and disposable), but I know @deepreef and I will never agree on this ;) |
@deepreef As far as I'm concerned LSIDs are dead and it does not seem like it is worth maintaining an infrastructure that mints any more of them. I'm mostly concerned about them as a sort of archival issue. In other words, is there ANY way to recover the information they were supposed to provide if someone were to read an old paper that used them and wanted to get whatever information they were supposed to provide. That is what @rdmpage's tool does, subject to actually having access to the underlying data. |
OK, thanks @rdmpage -- so it's not so much about the LSID resolution protocol as it is to get the content in XML format similar to the the LSID template? That should be a lot easier, I imagine. Above you gave two examples of output, the LSID template and the JSON template. Again, the latter was just a proof of concept that we never finished (mostly Rob Whitton wanting to get his head around how to implement JSON). After we built it, we put out the call for feedback on how to modify the structure to represent it in ways that people would find useful. Again, the response was deafening silence, so we never followed up with it. So... let's assume that nobody is using the LSID resolution protocol, so we don't need to resurrect that. Let's also assume that nobody is using the ZooBank APIs, so I can re-develop those without breaking anyone's existing code (or I can keep a legacy version if people really want and use that crappy JSON template). And finally, let's assume I will commit to doing the necessary work to make it happen (like I said, the timing is good as I'm mucking around with the IPT now anyway. If we assume all of those things, then it makes a lot of sense to me to harmonize at least the output content for IPT, XML and JSON. IPT is the only one following a real, active standard (DwC), so let's use that as the "core" content. DwC lacks a literature standard (something we've always wanted) so maybe I can just use the terms as they are in the LSID template. My thinking is that ipt will continue to do its thing (via http://ipt.zoobank.org:8080/ipt -- not quite functional yet, but it will be after I synch with @mdoering). Then I'll base two APIs off the same content, one that outputs in XML, and one that outputs in JSON. Here's what I need help with:
http://gnub.org/a1ae7a00-32c6-4510-a1d6-6dddA9129d8b.xml Is that the best way to do it? Or would it be better to go with something like: Or maybe: http://gnub.org/tnu/a1ae7a00-32c6-4510-a1d6-6dddA9129d8b.xml Maybe it doesn't matter (in which case I'll go with the first option, because it seems clean to me); or maybe it does matter (in which case someone needs to tell me what it should be). I know it's bad GitHub etiquette to write such long posts, but you all know me well enough to know that I don't care about GitHub etiquette (I'm going to spell this stuff out explicitly no matter what, so get over it). But I'm serious about rebuilding this stuff right -- meaning in a way that is useful enough that the user base may eventually expand beyond two or three clients. |
@baskaufs : Yes! As I said, we're committed to maintaining the "identity" part of LSIDs into perpetuity (if not the resolution protocol part). That was one of the things I wanted to achieve through http://bioguid.org BTW, that is another @rdmpage - inspired service that almost got off the ground, then went into hibernation for a few years, but sometime within the next year or two I plan to bring it back to life again (with gusto!) But that's a topic for another thread... |
@deepreef From the perspective of the LSID archive ideally XML like the example I showed above #117 (comment) (which was actually retrieved from ZooBank when its LSID service was live). The structure and vocabulary of that file closely match IPNI, Index Fungorum, and ION, which makes integrating all sources of data much easier. If nothing else, if we get ZooBank added it means that the millions of LSIDs for names in the wild, including those which presumably have some nomenclatural significance would all be "resolvable". So, would it be possible to serve XML like #117 (comment) for each taxon name? Maybe the original code for this still exists in the ZooBank source code? I have no preference for API interface, presumably something like http://zoobank.org/NomenclaturalActs.xml/6EA8BB2A-A57B-47C1-953E-042D8CD8E0E2 would be consistent with the current API? |
OK, I'll use that as a starting point. You said it "closely matches" IPNI, IF and ION. Can we bump that up to "exactly matches" to make it even easier? If I'm going to need to build it anyway, I might as well add any additional tweaks to improve it in any way you wish. I'll start with the template as you presented it above, but I assume it won't break anything if I add additional properties (as long as I don't change the existing ones) -- is that a safe assumption? Framing it as ZooBank is artificially constrictive, and will lead to broken links from parentUsageID (assuming I add that property to the XML output). Why not apply it to the entire GNUB content? Here are some comparisons of numbers:
All of the ZooBank content is included among the GNUB content. The only difference is that the ZooBank records have both a UUID and an LSID (and also a little bit of metadata, such as when the content was registered), whereas the GNUB records only have the UUIDs. If we could just add one property to indicated that a given record was registered in ZooBank, then it seems to me that the GNUB content would make the most sense to scope the service for. I guess it wouldn't hurt to do both as separate services (one at zoobank.org, and one at gnub.org), but that seems pretty redundant when the gnub version already includes everything in the zoobank version. Yes, the original code does already exist, so it won't be too hard to resurrect it exactly as is. One last thing, though: we're talking about "resolving" the LSIDs, but your example uses the UUID. My assumption that both will work, but my question is whether the uuid should be presented in the output as a separate identifier, or just leave it to the end-user to harvest it from the LSID. So, just to be clear: my current plan is to implement an interface that returns the exact same XML as you listed above, but directly (rather than through the LSID protocol). I'll make it so you get the same results for any of these: Once I get that working, then we can move on to the next questions:
|
@deepreef From my perspective I'd just like the ZooBank LSIDs (I think of my services as a "wayback machine" for LSIDs). So my preference is not to include additional links to GNUB, but obviously that's up to you. I'd only harvest ZooBank LSIDs as they are the only ones that are likely to be in the wild (e.g., in publications or referred to in external databases such as Wikidata). Regarding XML, the original example I gave above could be tweaked as it has some issues. In particulate, it doesn't link the publication to the name (except indirectly via a bnode). At the moment you have something like this:
whereas I think you want something like this:
The difference is that now we are explicitly making the link between the taxon name and publication LSIDs. RDF XML is horrible, the W3C validator is useful for figuring out if you're doing it right (it took me a few goes). The sooner we all move to JSON-LD and Bioschemas the better ;) |
Are the presentations from these workshops or the presentation by @rdmpage still available somewhere, by any chance? |
@cboelling This was 2005-2006 as I recall, and whatever I said then is probably stuck somewhere on a ZIP file or a DVD! My recollection at the time was that we looked at DOIs, Handles, PURLs, and LSIDs. The discussion was heavily driven by costs, so DOIs were seen as problematic as they were expensive. Ironically, DOIs were already in use at the time by NamesforLife (N4L) a company set up by George M. Garrity (who was at the meeting) to manage bacterial names and taxonomy. For example, doi:10.1601/nm.3093 is the name Escherichia coli, and doi:10.1601/tx.3093 is the corresponding taxon. Imagine if we'd gone down this route and hand DOIs for every Eukaryote taxonomic name... oh well. Handles are DOIs without the branding and with minimal costs, but you have to mange them using clunky software. PURLs just move managing persistence somewhere else using someone else's brand and worse tools. LSIDs had the advantage of being free, they keep your organisation brand, and by serving RDF they forced nomenclators to standardise on a data format (the TDWG LSID vocabulary). But their dependency on messing with DNS and using SOAP made then beyond the reach of many biodiversity developers. As is typical in these discussions when the participants have no money, the free solution won. If you don't value the solution (i.e., won't spend money on it) then why would anyone else value it? Personally I think we missed the absolutely key challenges, which are to:
DOIs are the shiny example of doing this right, LSIDs, not so much. The challenge is to make sure you have 1-3, once you have that then the actual identifier technology doesn't matter so much (but of course, some have brand recognition, which is why DOIs are taking over the world). |
@rdmpage : EXCELLENT! This is exactly the sort of feedback I was hoping for. OK, I decided sleep wasn't necessary tonight, so I went ahead and built version 1 of the service, incorporating your requested tweaks. I made as couple of other minor changes changes:
In any case, have a look and let me know if this works to your needs: I have NOT tested this extensively! I tried to trap for ampersands and html tags and whatnot, but I might have missed some, so there may be errors. Please let me know if you find problematic records Questions:
On a final note: Within the next couple years (coinciding with Code-5), ZooBank will likely stop wrapping the uuids within the cumbersome and unnecessary LSID prefixes. From that point forward, the plain uuids will be in the wild (they already are -- they just happen to be prefixed by the LSID stuff). OK, probably time for some sleep now. |
It's not too late! I can always replace the urn:lsid:zoobank.org:[pub|act|author]: prefix with a 10.xxxxx/ prefix. All I need to do is get a xxxxx for ZooBank. Right? |
@deepreef Cool, I will take a look. From my perspective, in a ZooBank LSID it's not the LSID prefix that is cumbersome... it's the UUID. I think if you (a) adopt DOIs for names and (b) drop the UUID and have a nice short user-friendly string (can be opaque) you would do wonders for the adoption of persistent identifiers for zoological names. |
I agree. Even though it feels silly I have the same reservation for UUIDs. That's why we decided in COL to use short alphanumerical strings that try not to resemble real words and avoid easily confused char pairs: CatalogueOfLife/backend#491 They can also be converted to ints for a more memory or db friendly incarnation. |
OK @deepreef I hoping that you're getting some sleep now ;) Here is my version of what ZooBank XML should look like, with comments to explain why I've made the changes.
I've also made it into a gist https://gist.github.com/rdmpage/ea25baf487a17af4a2184f0ca5bef98b and you can look at the revisions to see the steps I took to change it. The RDF now validates. Biggest change was to tidy up the publication, and use the correct TDWG term tcom:publishedInCitation ("tn:publication" isn't a thing, even though Index Fungorum uses it). There was also some stuff at the end of the document that needed to go. I'd forgotten just how awful RDFXML is to work with. |
@deepreef Oops, forgot your other questions. If there's no info then I would simply not include the corresponding tag, so no volume, no tag. Other identifiers, yes please, especially DOIs (elsewhere I'm harvesting ZooBank's DWCA to add DOIs and other identifiers, but it would be nice to have the ones ZooBank already knows about). |
Thanks for this context @rdmpage. I agree and its vital to keep this in mind in current decision-making on identifier systems. Of course, the governance structures to actually achieve persistance of data and services are an essential part of any relevant solution. |
@cboelling Yes, governance matters, but I would argue providing value to users should be the primary driver. If something isn't useful and doesn't help people do what they want to do, then all the governance in the world won't help. |
@rdmpage @mdoering : On the uuid thing; well... we're just going to have to agree to disagree. Especially in taxonomy, we already have the "identifier" that is human-friendly (it's the scientific name itself). From the perspective of humans, these identifiers have worked spectacularly well (otherwise they wouldn't still be in use a quarter-millennium after they were launched). Humans have no problem accommodating things like misspellings, alternate genus combinations, homonyms and the like. Computers, of course, have different needs in identifiers. They need to be globally unique and explicitly attached to the associated metadata, and above all, they should never change. Sure, integers work great for things like foreign keys and such -- which is why every database I create (including GNUB/ZooBank) uses integer fields for primary and foreign keys. I even have a system that unambiguously links each integer primary key to its corresponding UUID. But there's a reason it's a very (VERY) bad idea to use a value of a primary key field as your globally unique identifier. We could debate this indefinitely (as we have for years already before, and as we no doubt will for years to come); but I'm much more interested in focusing this discussion on this:
YES! YES! YES! Let's make stuff that people actually find useful! That's exactly why I was up until 2am this morning tweaking the XML service -- because someone might find it useful. It's also why I want to get the IPT up and running again, and why I'm eager to create JSON-LD service and leverage Bioschemas. I'm going to need a bit of hand-holding to get those up and running, asking lots of rookie-level questions like "should I include the tags if the content is empty" and such. @rdmpage : THANK YOU -- that's EXACTLY what I needed: an explicit template to implement. I'll stop typing this post and start coding now. Back in a bit. |
Of course, the moment after I posted that last note, I realized I was late for my first (of many) Zoom meetings for the day, so coding got delayed. However, I just now had my first break, and went straight to the coding. I followed your template: I also added additional identifiers, when I have them. I can display the identifiers either with the dereferencing metadata, or without. In some cases, it's obvious that I should include the dereferencing metadata, for example: In the case of LSIDs, the dereferencing metadata is built into the identifier itself (i.e., the urn:lsid:zoobank.org:act: part) But what about DOIs? Should I include the dereferencing metadata, or not: For now, I'm including it: One Rookie question: Among the declared references in the opening RDF tag, some of the URLs have a hash at the end, and some don't. Is that a thing? Should I strip the ending hash characters? Add them to the ones that lack them? Leave them as is? Probably not important, but I'm just letting my OCD run wild on this. Awaiting further instructions to do even more stuff that people will find useful.... |
One other note: there are some data quality issues due to how the users enter data in a messy way. For example, the DOI is properly stored in the database as 10.3897/zookeys.641.11500; but people will sometimes enter it as "https://doi.org/10.3897/zookeys.641.11500" or "doi: 10.3897/zookeys.641.11500". It's on my to-do list to clean all these up in the master database, but for now there is a lot of noise in there, so you'll get things that look like these: If this is a problem, I'll bump the clean-up task up higher in the priority list. |
@deepreef Regarding the namespaces in the rdf:RDF tag, they can end in either a forward slash / or a hash #, depending on the choice made by whoever created that vocabulary. Given that this is the delimiter between the namespace name and the property you need to keep them, for example, http://purl.org/dc/elements/1.1/identifier (= dc:identifier) and http://www.w3.org/1999/02/22-rdf-syntax-ns#Description (= ref:Description). See HashVsSlash for background. |
@deepreef Regarding identifiers there are a bunch of ways to include and represent DOIs (that there are so many ways to do things is yet another reason RDF is hard work). If you are going to use dc:identifier then my suggestion is to store it as a URL with the prefix https://doi.org/, so |
Thanks, @rdmpage
I get that part (when used as a delimiter). I was talking about the terminal character in the URL; e.g.: I'll assume they're there for a reason. RE: DOIs: OK, I'll leave them with the |
If easy. You might change the doi url to https
…Sent from my iPhone
On Mar 11, 2021, at 7:09 PM, Richard L. Pyle ***@***.******@***.***>> wrote:
Thanks, @rdmpage<https://github.com/rdmpage>
Given that this is the delimiter between the namespace name and the property you need to keep them, for example, http://purl.org/dc/elements/1.1/identifier (= dc:identifier) and http://www.w3.org/1999/02/22-rdf-syntax-ns#Description (= ref:Description).
I get that part (when used as a delimiter). I was talking about the terminal character in the URL; e.g.:
"http://rs.tdwg.org/ontology/voc/TaxonConcept#"
vs.
"http://rs.tdwg.org/ontology/voc/TaxonConcept"
I'll assume they're there for a reason.
RE: DOIs: OK, I'll leave them with the http://doi.org/ prefix (dereferencing metadata)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#117 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAO4SSMMCTCYSTMWM7BJKK3TDFLUJANCNFSM4Y5YL3UQ>.
|
Yes! A term such as
Rich, the convention is now to use https as @jgerbracht points out (as you know the Achilles heel of HTTP identifiers is that they keep changing). Most RDF now being minted that's relevant to biodiversity (e.g., JSON-LD in Zenodo) uses https DOIs. If and when we move to http://schema.org there are nice ways to encode identifiers independently of resolution. But for now, it would be great to use |
Actually, no change needed. The service already represents DOIs with the https:// prefix -- I had just mistyped my post above. Sorry about that! @rdmpage :
You have no idea how hard it was for me to resist the urge to rant on this (actually, maybe you do know how hard it was for me...). With all due respect to TBL, conflating dereferencing metadata with identification is a really bad idea. 'nuff said.
Excellent! Thanks! Your explanation makes perfect sense.
Indeed! That was my experience setting up the LSID resolution protocol for ZooBank way-back-when. It quickly became clear that only one client ever noticed when it stopped working or was otherwise broken. It's hard to justify committing time and resources to developing services that only have one client. On the other hand, if that one client who directly accesses that service turns around and makes it useful to hundreds of other clients through awesome services like this, it makes it all worth it (hence my enthusiasm to fix this ZooBank XML service). Speaking of which, let me know if you'd like an svg of the ZB logo. But the most intriguing thing to me is this:
This is exactly the idea behind http://bioguid.org (as you know): decouple the role of identification from the resolution and dereferencing mechanisms. Maybe my next "2am project" will be to build a service on that website that produces JSON-LD following schema.org. Nobody is using it right now anyway, so with over a billion identifiers to play with, it might make a nice sandbox for fleshing out an identifier cross-referencing system using these next-gen approaches. I'd definitely need some hand-holding; but I'm game if you are. |
Can't help myself: arguably "conflating" the two is the genius move that makes it possible to build networks of easily discoverable, inter-connected data. I think it's one of those classic tradeoffs, and TBL picked the one that gave us the web. But we can argue this point endlessly. In practice I think we can program defensively and include both types of identifiers in our metadata, for example this is how ORCID does it:
That would be great! I have one I made, but having an original from source would be better.
My sense is that this is the role Wikidata is playing, and that it's our best bet as an "identity broker" to map between different identifiers for the "same" things. |
Just thought I'd post the https://bioschemas.org JSON-LD for Ectenopsis mackerrasi that is embedded in the GBIF page for this species if anyone wanted to compare it to the RDF above from ZooBank.
|
Just a quick question about something I haven't kept up on. When Crossref first announce that it would support content negotiation with DOIs, they said to use |
@baskaufs https://doi.org is the current way to do things (see https://www.crossref.org/education/metadata/persistent-identifiers/doi-display-guidelines/ ). It supports content-negotiation as before, but the RDF sucks. I don't think it's been worked on recently, for example it uses the old style |
Thanks, good to know. I can see I'm way behind the times. I haven't paid that much attention to the content-negotiation because as you say, they don't really give you much useful info, particularly compared with what you can just get from the Crossref API. |
@rdmpage :
Fair points, all.
Indeed; but we should save that for another context. I'll cease and desists on the opportunities for snarky commentary on that subject. And BTW, when I said "all due respect to TBL", I meant it genuinely. His role in history is, and forever will be, monumental. It's just that one thing I have a quibble with.
Excellent! But I think they need to add one more property; something like:
I'd actually nest an array of properties (things like
Cool! I'll get on that later today. I'll make two; one for this:
That was the original intent of bioguid.org (thanks again, btw, for letting my hijack that name): to map "sameAs" relationships among identifiers in the biodiversity space. I eventually expanded it to accommodate other predicates -- mostly so I could capture links between TNUs and BHL pages. But the main thing it does, which I haven't seen anyone else do (at least not well), is to parse out dereferencing service metadata from the actual identifiers (see above). Any given identifier might have more than one dereferencing service. For example: Does Wikidata have a mechanism for doing that? Or would it treat all of these as discrete "identifiers", without parsing out the part that represents dereferencing metadata from the part that represents identity?
Nice! You just gave me my 2am project for the day (technically tomorrow). Bonus: it's a weekend, so no early-morning Zoom calls tomorrow morning to show up for groggy and disheveled! Question: Should I simply update the existing service at http://zoobank.org/NomenclaturalActs.json/ (which will no-doubt prompt howls from the thousands of people who currently use that service every day...not...); or would it be better to start anew with something like: Ask, and ye shall receive. |
My how time flies :O Ok @deepreef I've added support for ZooBank "act" LSIDs. I've had to create a second resolver https://lsid-two.herokuapp.com, which currently has WoRMS and ZooBank LSIDs. The reason for this is that I store all the LSID metadata as disk files (no databases) and I'm limited by Heroku to GitHub repositories that are < 500 Mb in size. The "no database" thing to avoid dependencies on other servers, and partly because the whole idea is to have a backup of the data. This achieves both goals of a backup and a service. |
@deepreef I grabbed all the LSIDs I could find using a recent DwCA file from GBIF as the source for the list. Oh how I miss simple integer ids that I can count up when I'm fetching data ;) Anyway, one thing it would be nice to fix about ZooBank LSIDs is the
But which in ZooBank has the authorship information as well. This breaks my other reason for doing all this LSID stuff, which is to build another "no database" search engine for taxonomic names which relies on |
Thanks, @rdmpage ! Alas, I'm deep down other rabbit holes at the moment, so it may be a while before I can get back to this.
OK, that should be easy enough. I think I just used the same Code/Logic as for dwc:scientificName, and forgot that If you really like integer ids, you can always render the 128-bit identifiers in decimal, instead of in the canonical UUID form (e.g., instead of representing those 128 bits as something like 'e593838a-f7a9-5ef2-a04a-2bfc7c90771f', they could be represented as '305159146678742414161168577211252373279'; see here). But I suspect that's not really what you meant... ;-) |
TL;TR You can resolve LSIDs for taxonomic names here: https://lsid.herokuapp.com
Sorry for gatecrashing, but this might of interest. Given that there are millions of taxonomic names with LSIDs, most of which no longer resolve using the LSID protocol, it's always bothered me that we've let LSIDs die. So, I've made a website Life Science Identifier (LSID) Resolver that serves up the metadata for each LSID for names from three datasets (IPNI, Index Fungorum, and ION). These are all sources that used to support LSIDs, still display LSIDs, and in some cases still make the metadata available using the TDWG LSID vocabulary (if not via the LSID protocol).
The metadata is cached so the LSIDs resolve regardless of whether the source database supports the LSID protocol. Might be fun to compare the metadata from these LSIDs with what any new TNC comes up with. Note that there are some issues with the metadata, including mistakes and/or inconsistencies in the namespaces, and how the XML was constructed. I suspect these occurred because nobody ever actually used it.
I hope to add other LSIDs as time permits, and also depending on whether the database still provides metadata for LSIDs in TDWG LSID RDF.
The text was updated successfully, but these errors were encountered: