Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise Instrument resourceTypeGeneral and related changes to address RFC comments #24

Merged
merged 3 commits into from
Jan 11, 2023

Conversation

KellyStathis
Copy link
Collaborator

@KellyStathis KellyStathis commented Nov 22, 2022

Purpose

Address the feedback on Instruments from the RFC.

Changes from this branch can be previewed here: https://datacite-metadata-schema.readthedocs.io/en/4.5_instrument_revision/

Approach

  • Update the definition of Instrument.
  • Update the usage notes for Instrument to clarify that this type is to be used for an instrument instance, not the instrument description/design.
  • Replace the proposed relationType pair "IsUsedBy"/"Uses" with "Measures"/"IsMeasuredBy".
  • Remove guidance on line breaks from descriptionType "TechnicalInfo".
  • Add guidance on the Publisher property to the PIDINST crosswalk.

Open Questions and Pre-Merge TODOs

  • Confirm whether to add guidance for compound instruments
  • Adjust definitions of Measures/IsMeasuredBy as needed

Reviewer, please remember our guidelines:

  • Be humble in the language and feedback you give, ask don't tell.
  • Consider using positive language as opposed to neutral when offering feedback. This is to avoid the negative bias that can occur with neutral language appearing negative.
  • Offer suggestions on how to improve code e.g. simplification or expanding clarity.
  • Ensure you give reasons for the changes you are proposing.

@KellyStathis KellyStathis marked this pull request as ready for review November 22, 2022 05:10
@KellyStathis
Copy link
Collaborator Author

Raising @Jashton123 comments directly on the commit here for ease of tracking:

28cf6c1#commitcomment-91400178

Jan:

'IsMeasuredBy' and 'Measures' seems to constrain the usability of the property. Not all instruments 'measure' things. Would it be possible to also include 'IsUsedBy' and 'Uses' to make the property more inclusive of all scientific instruments not just those focused upon by the PIDINST schema?

@KellyStathis
Copy link
Collaborator Author

28cf6c1#commitcomment-91401130

Jan:

For compound instruments should we consider creating an additional relationType of 'hasComponent' and "IsComponentOf" as suggested by Markus Stocker

@KellyStathis
Copy link
Collaborator Author

'IsMeasuredBy' and 'Measures' seems to constrain the usability of the property. Not all instruments 'measure' things. Would it be possible to also include 'IsUsedBy' and 'Uses' to make the property more inclusive of all scientific instruments not just those focused upon by the PIDINST schema?

What other types of instrument activity are we considering? Would any of the existing relationTypes work?

@KellyStathis
Copy link
Collaborator Author

KellyStathis commented Nov 29, 2022

For compound instruments should we consider creating an additional relationType of 'hasComponent' and "IsComponentOf" as suggested by Markus Stocker

This is an interesting idea and could work! How would we differentiate this from HasPart in the definition?

This might also solve the problem of distinguishing file (component) DOIs from dataset DOIs, mentioned here from Dataverse: IQSS/dataverse#5086

However, I think we might need more consultation on this solution than we will have time for in 4.5.

@Jashton123
Copy link
Collaborator

'IsMeasuredBy' and 'Measures' seems to constrain the usability of the property. Not all instruments 'measure' things. Would it be possible to also include 'IsUsedBy' and 'Uses' to make the property more inclusive of all scientific instruments not just those focused upon by the PIDINST schema?

What other types of instrument activity are we considering? Would any of the existing relationTypes work?

Maybe microscopes?

@tedhabermann
Copy link
Collaborator

tedhabermann commented Dec 14, 2022

It seems that no single relation type fits all possible relations between research objects and instruments. Do we want to stretch meanings or add more relations? We could add both "measures" and "uses" pairs to allow users to pick what they like...

@KellyStathis
Copy link
Collaborator Author

@tedhabermann I'm not sure the "Uses" pair is well defined outside of the instrument context (and even then, there was a lot of ambiguity)... so I am concerned it is going to introduce more confusion.

Any thoughts on "ObtainedBy/Obtains" as suggested by Mohamed over email?

@tedhabermann
Copy link
Collaborator

I have never heard someone say my data were obtained by an instrument. In my experience people obtain things - not instruments. The conclusion seems to be Measured and Used both apply in different ways and cases. Seems unlikely that we can decide between them. Is there a problem adding both?

@KellyStathis
Copy link
Collaborator Author

KellyStathis commented Jan 4, 2023

@tedhabermann I think the problem with "Used" is that it is a vague term, but there was a somewhat specific meaning intended for instrument usage in the original PIDINST proposal. If we define it self-referentially as "A is used by B", this does not help clarify how the relationType is intended to be applied. I also don't know of other resource types for which would recommend the selection of "IsUsedBy"/"Uses".

Here are Rolf's comments from the RFC about this relationType:

I believe this new relationType originates from a proposal [1] that I submitted on behalf of the PIDINST WG. But this original proposal was meant for a different use case: an instrument might have been used in a research activity. For instance if an instrument had been deployed during the cruise of a research vessel and we have identifiers for both, the instrument and the expedition, the instrument could link the expedition with IsUsedBy and the expedition could link the instrument with Uses.

I agree with Markus that the notion of a dataset that might have "used" the instrument having collected the data sounds somewhat weird. Probably, because intuition would say that in this relation, the instrument would be the active part and the dataset is passive.

Having said that, we definitly need to discuss and agree on "the proper" relation type to link a dataset with the instrument having collected the data. At the moment, in the current version of schema, I use IsCompiledBy, e.g. the instrument is used to compile or create the dataset. It seems to me the relation type in version 4.4 that matches best.

I would be happy to switch to any other relation type if we agree on it. But we need to standardize this and document it, because I believe this is a very important relation.

@tedhabermann
Copy link
Collaborator

Kelly - My point is that it is very difficult, maybe impossible, to eliminate all ambiguity from these relationTypes. We can go around in circles for a long time, actually we already have, on this. Many smart people in the DataCite community will have an opportunity to weigh in on this - let them decide!

@KellyStathis
Copy link
Collaborator Author

@tedhabermann Definitely we can't eliminate all ambiguity, but I also don't think that's a good reason to include a very vague term if we don't have a good sense of how to apply it.

How do you see "Used" being applied here? Could you give me some examples to help understand (including some that are not for Instruments)?

@tedhabermann
Copy link
Collaborator

We have precedent for resource dependent relation types (I think). What comes to mind is relationTypes that were added for software citations (see Appendix 4 and IsRequiredBy). So, is generality across resourceTypes a requirement?

@KellyStathis
Copy link
Collaborator Author

I don't know that generality is a requirement, but I would say it's desirable.

To me, "Uses" is broad enough that it seems like it could be misapplied to other scenarios. For example, one might say colloquially that a paper "uses" a dataset that it analyzes, but we would much rather encourage the relationType "Cites" or "References" here.

I think we got here because:

  • "Uses/IsUsedBy" was misinterpreted, as became clear in the RFC discussion. It was meant for the relationship between an instrument and a research activity (as explained by Rolf), not an instrument and the data that are measured by, or obtained through, that instrument.
  • To add clarity, we proposed "IsMeasuredBy/Measures". However, this may exclude some applications as not all instruments "measure" things. For example, a camera is an instrument that can (obtain? produce?) a photograph, but the photograph isn't measured by the camera. I don't think "Uses/IsUsedBy" solves this problem (a photograph definitely wasn't used by a camera, or a camera used by a photograph...).

Rolf had said we need a relation type to "link a dataset with the instrument having collected the data". What about IsCollectedBy/Collects? Is that an improvement?

@kitchenprinzessin3880
Copy link

kitchenprinzessin3880 commented Jan 5, 2023

@KellyStathis if we want to keep the relationship between an instrument/agent and its data neutral, we can use

  1. collects/collectedBy or
  2. observes/observedBy (this can be either measured automatically or through human mediation e.g., microscopes).

@mdesmaele
Copy link
Collaborator

@KellyStathis I think IsCollectedBy/Collects fits well to indicate the relationship between instrument and its data. In the guidance we can make clear that this concerns data that has been measured, obtained, produced or observed by an instrument.

@Jashton123
Copy link
Collaborator

Jashton123 commented Jan 5, 2023 via email

@tedhabermann
Copy link
Collaborator

tedhabermann commented Jan 5, 2023 via email

@KellyStathis
Copy link
Collaborator Author

@tedhabermann I'm not sure about the interaction with "Collection". I know we have encountered challenges with defining the scope of the Collection resourceTypeGeneral. I suppose the act of "collecting" doesn't always produce a "collection" (does that add to the confusion)? That said, this didn't immediately strike me as problematic.

I think "IsObservedBy/"Observes" is also a reasonable option. I would consider observation as inclusive of measuring/obtaining data. However, I wouldn't intuitively say that an instrument "observes" data though (I would probably say "detects", but I don't think that is an improvement over our other options). For this reason, I would lean towards the "collects" pair if that also seems intuitive to others.

@KellyStathis
Copy link
Collaborator Author

I am merging the changes with IsCollectedBy/Collects so that we can review the complete draft together. Email to follow!

@KellyStathis KellyStathis merged commit c4956b0 into 4.5_draft Jan 11, 2023
@KellyStathis KellyStathis deleted the 4.5_instrument_revision branch January 11, 2023 00:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants