-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DWC mapping #7348
Comments
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as off-topic.
This comment was marked as off-topic.
Also, is the order of these fields intentional? Does it only go to aggregators so it doesn't matter? |
It's just a dumb mapping, we don't really have anything equivalent (but I think maybe can't leave it NULL??).
https://github.com/ArctosDB/code-table-work/issues/77 (we're aiming for fewer, not more! - but comment there...)
Order is arbitrary/irrelevant (not just here...) - although I think IPT has some very not-great ideas about order which might eventually influence something. |
😭 |
GGBN meeting discussed this (https://docs.google.com/document/d/1Qs-RQKkIqpJr5xx2VbfDnxPbw4vFDcypGWcVDjyBYo4/edit?tab=t.0#heading=h.mn8up28dmjnq) consensus is to move ahead with simple remapping, ideally before talking to GGBN in ~January. |
@happiah-madson says
This sounds most excellent to me, I propose merging
somewhere, but I think perhaps https://dwc.tdwg.org/terms/#dwc:otherCatalogNumbers is better than https://dwc.tdwg.org/terms/#dwciri:recordNumber and associatedSequences is a bit redundant - should we drop it too? |
@mkoo in case its handy for your new column (I changed the name to make valid CSV), here a sample of MVZ:Herp data built from the spreadsheet as of NOW. |
@campmlc here's a sample of some MSB Organism-having records: |
I added a new tab, I'm bringing Media into this discussion. Media and Occurrences are closely linked, and the unanimous consensus seems to be to act on this sooner rather than later so let's just fix everything at once. Here's some sample data generated by the Media tab of https://docs.google.com/spreadsheets/d/1aCBYX9ErjicL8VdNdHbJUI0JTwWu6L4D_37gJ7IneRY/edit?gid=1920472984#gid=1920472984 See also https://github.com/ArctosDB/internal/issues/365 |
I think there could be an argument for actual other collection catalog numbers being in the former and all the rest in the latter (only b/c in an issues meeting, there seemed to be strong preference from the community to distinguish between catalog numbers and all the other chaos (i.e., yes, we'll use "identifier" for nearly everything except for actual catalog numbers). |
Also, @dustymc, I was going to file an issue to put life stage into cache, but filtered_flat.age_class doesn't seem to be in the DwC mapping anymore? Am I just confused? |
I haven't seen the slightest glimmer of a hint that there might be an actual distinction in there (and if there is then it's probably completely fatal to the idea of collector number) so this seems very premature. If such a distinction somehow arises then remapping to it should be trivial.
I get that, but I can't write code to it and until that's possible (and obvious enough that everyone else who's sharing data to DWC makes the same choices) it can only serve as a means to make data less accessible.
Go for it (and please frame it as a functional need!).
... exist, because the source of it doesn't exist. |
Crazy-maybe-dumb cake+eat idea: identifiers -- > https://dwc.tdwg.org/terms/#dwc:otherCatalogNumbers Example:
|
So my understanding (and I believe everyone else who has been in this discussion over the years) is that the only "othercatalognumber" identifier type the community considers to be an actual catalog number is "institutional catalog number", not collector number or anything else (except for Arctos Guids, which are catalog numbers which would only apply here if the relationship is "same individual as" to this occurrence) Apologies if I am not familiar enough with the mapping issue and am misunderstanding. |
That is certainly what we use to do: concatenated list of all genbank/bold accession numbers. |
I feel like this should be the other way round w/ identifiers as record number and othercatalognumbers as otherCatalogNumbers. Am I missing something @dustymc Also: we use the identifier type "othercatalognumbers" rather than an actual collection bit like DZTM: Denver Zoology Tissue Mammal. Is that okay? If we had every other collection that we have materials co-located in, this code table would become dare I say, even more unwieldy. |
Sure, no problem (but it's the same data...). So @mkoo @happiah-madson @dbloom can I just replace our current ipt_cache.occurrence with this (and the media tab) and figure it out from there? There's a timeline (#8301 (comment)) and IDK how else to proceed... |
Your data made perfect sense to me. |
Pulling some @happiah-madson comments here, they need more discussion than I can tolerate in a comment chain:
assigned by agent is correct because they're the person who's linked the record and a place locality_attribute_determiner is correct because it is (sometimes) This whole conversation is a little weird in that DWC has a baked-in assumption that coordinates come from georeferences fed by primary locality strings, which of course isn't how reality or Arctos sees the world. Possibly ignoring all of the 'georef' fields would be the most appropriate response? |
sorry 😬😬😬😬😬😬😬
That's kind of what I'm wondering! We give all the appropriate attributions in Arctos and the work is tracked, but...incorrectly linking it in DwC seems unnecessary. |
I'm making the mappings live per conversation with @mkoo |
Tables have been rebuilt (and performance is, comparatively anyway, fabulous), I believe next step is to publish.
|
@dusty Testing the theory, so to speak.... Currently building a resource on the VN IPT for UCM FossilVert (data-migration#1968 above). Noting that re: DwC Occurrence Core:
I am mapping the Audiovisual Media Ext (formerly Audubon Media), although I cannot test it's effectiveness since there are no media associated with this collection, but I can report the following in the IPT.
Before I spend a lot of time detailed how I might map each of these, do you want all of these fields mapped. If so, I will respond with recommended mappings. Please advise. Additional Question: With the new tables and fields to any of the following fields contain URIs? I ask because those fields map automatically to both dc:format and dcterms:format. The latter of each of these is expecting a URI as the standard input. It has been the VN policy dating back to Laura Russell to map only to dc:format and to un-map to dcterms:format because these fields have not contained URIs. If this has changed, please let me know (and I'll update all 200000 Arctos resources on the IPT). |
Yes, I think that's correct, we have no such concept. (Err, we do, but it involves analyzing agents and dates and methods and such, not a text string.)
Unless there's some compelling use case for retaining it, I propose to remove this mapping. (Arctos data are complicated - there could be like 40, or none but still dates, etc., etc. - attempting to pull this out seems potentially misleading.)
PLEASE!! History is that we had an antique and inappropriate mapping, I flailed around trying to figure out what to do that properly includes license, then did some random things. See also https://github.com/ArctosDB/internal/issues/365. We do want to publish media, do not know how, any help greatly appreciated.
type - no, tentatively mapped to terms in https://arctos.database.museum/info/ctDocumentation.cfm?table=ctmedia_type
In all of the above, I think we are all very receptive to any sort of suggestion, nudge, hand-holding, whatever. I don't look at DWC more than I have to, things change WAY faster than I can keep up (and I didn't necessarily know what I was doing when I thought I did...), none of this is very "core" to me, I think we'd all like to share as much as our resources allow, any and all help in getting there is greatly appreciated. THANKS!! |
The Map
https://docs.google.com/spreadsheets/d/1aCBYX9ErjicL8VdNdHbJUI0JTwWu6L4D_37gJ7IneRY/edit?gid=0#gid=0 will be the primary Arctos-->DWC mapping document; please make suggestions/corrections/etc in this issue.
Mapping Test
Here's a sample of DWC generated from the spreadsheet: temp_dwc_sample.csv.zip
Let me know if you need to see this with some particular data, or what I can do to make things clear.
Goals
A clear and functional DWC mapping document.
Scope
This Issue is for mapping to "flat DWC" (DwC-A). Media/AudubonCore (existing mapping) can be addressed elsewhere. Extensions (new mapping) would also need dedicated Issues and justification. (Because some - perhaps most - don't do much.)
Major Change
@mkoo and I believe mapping should be simplified, where only each "best occurrence" (eg what's in FLAT) is shared via DWC; that's in line with current cataloging practices, will exclude mostly things like lower-quality georeferences, will be a huge simplification in mapping and understanding the data, and will not require us to mint fake identifiers (which make GBIF nervous and might well end up in publications).
working comments
In progress: "translate" SQL (https://github.com/ArctosDB/PG_DDL/blob/master/shared_data/dwc_occurrence.sql) to spreadsheet (in a way that can be used to write dynamic SQL).
I'll merge related issues here so they can be addressed in context. It'll take a while.
Some possibly-related issues: https://github.com/ArctosDB/arctos/issues?q=is%3Aissue+is%3Aopen+label%3A%22Aggregator+issues%22
The text was updated successfully, but these errors were encountered: