-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should we say anything about consistency or otherwise of version metadata between version inventories? #421
Comments
SHOULD works for me. But definitely generate warning. |
Consensus from editors' call was that we should go with SHOULD |
In a comment on the PR #425, @ahankinson wrote:
My take on this (and hence motivation for only SHOULD be consistent) is that we should start from the idea that we want to support version immutability -- once a version, including its inventory, has been written it might not be possible or desirable to change it. With this starting point I can imagine wanting to correct |
I'm really sorry -- I must not have understood the discussion on the editors call on this, so I apologize if I'm re-hashing this... I think of the versions section of the inventory as being as immutable as the version directories themselves. Since the inventory file ( |
I think the scenario is the following, using the "minimal OCFL Object" example:
The At a minimum, I agree with the sentiment of this issue that we SHOULD not allow such changes... potentially, MUST not allow them. |
Note that the Also, note that this does not create any issues around |
I think you mean "does not create any issues"! My previous comment stands - we have examples where it is useful even if it is to be discouragesdin general. |
Oh yes, I have added the "does not" into my #421 (comment) |
So does that mean that for the v2 version we have to assume that the creator, date, and message can change? If so, how do we tell which one is 'correct' in terms of data provenance? If v1 claims that "Person A" created it, but then someone later decided that they wanted to expunge "Person A" from the record, and so wrote v2 to put their own name in as the creator of v1 -- which person has ultimate responsibility for v1? The original person, or the person that later versions claim? (Sorry about the mixup with sha512 files -- thank you for correcting me) |
Throwing my hat into the ring on too little sleep ... but my take is that the entire contents of the version block are immutable, and subsequent higher-version inventory files should not have metadata in those blocks that differs from the metadata in the older inventory files. My reasoning is provenance: we have no way of knowing, or tracking, why the address value for version 1 in v1/inventory.json differs from that in version 1 in v2/inventory.json. If the address was wrongly entered in the original v1, note it in a message block of a subsequent version; don't mess with the history. |
i think its a SHOULD, i see valid reasons for changing it in later inventories. however! if folks do this, can we say they SHOULD (or must if we're being adamant) indicate in the message that the change has occurred and why? maybe an unrealistic ask but someone who knew what they were doing would want to do that. |
Assuming that we do allow this (note: I'm still a 'no'), what are allowable changes? |
The base assumption we have to work with is that the files belong to the implementer, and that if they want to rewrite history, there's not a whole lot the spec can do about it. I think we all recognize that what happens between observation points (aka "object in motion") is out of our control. So if an institution wanted to change the message, datetime, user., they are always technically capable of doing this, and nothing in the spec can prevent this. They can even rewrite it going back to previous 'immutable' history states, and a validator would be none the wiser. Given that "your files are yours to do what you like with them" is a fundamental freedom of the implementer and one which we should not presume to infringe, I don't really see a need to be liberal in what we accept within the confines of the spec. The implementer can always rewrite history; I see it as our duty to point out places and reasons why they shouldn't, and to use spec language to reinforce that point. Changing the metadata for the same version from one version to another seems like an area where we can be quite strict about our expectations about the nature of versions for one point to another. I think we have to assume that technological problems (a client produces bad JSON, or wrong datetimes) should be caught by a validator at point of ingest, and that such problems should be fundamentally corrected in all versions throughout the object's history. If a message was missed off, that's actually a useful historical artefact. Otherwise, the temptation would be to supply a message, post hoc, that may or may not match the message that the original creator intended. |
@ahankinson : agreed. |
People change name and address so expecting them to be immutable AND be useful information doesn't really work. |
...personally I think name and address should be replaced by an OCFL object reference to a person object which can be versioned properly in its own right. |
yes i did thumbs down the above. |
I agree that going full OCFL-flavor linked-data per #421 (comment) would be a step too far |
@rosy1280 @zimeon Sorry, should have smiley'ed that comment! But allowing the address to be an object reference or something like an ORCID might be sensible. Then name/address/message can be immutable IMHO. I do think @zimeon's example of replacing the digests is valid thing we need to support in some reasonably elegant way though. |
Would this point instead to a need to capture digestAlgorithm on state, rather than on the top-level? |
I don't think so. The key idea is that the |
I can definitely see the use case for wanting to change to newer digest algorithms in subsequent inventories. I don't really think we should allow changes to the 'provenance' mechanisms, though (user, message, created). I think the difference is that the former can always be re-computed -- an old manifest can always get back the sha-512 of the file if we've switched to blake2b for example, and we would already track that change so it's just a matter of saying |
Imagine an object with
v1
andv2
. In each ofv1/inventory.json
andv2/inventory.json
there is a block"versions": { "v1": { "created": ..., "message": ..., "state": ...., "user": ... } }
.I feel that it is implied that the
state
must give a consistent set of files forv1
in each case (though the values may be different if different digest algorithms are used for different versions). However, do the values ofcreated
,message
anduser
need to (MUST) be consistent? Or is changing them an allowed but discouraged (SHOULD) way to fix metadata in a system that might have immutable versions? There is currently no comment in https://ocfl.io/draft/spec/#version-inventoryThe text was updated successfully, but these errors were encountered: