Add to implementation notes a discussion of the idea of temporary space while building OCFL Objects #320

zimeon · 2019-03-14T17:42:26Z

From discussion in 2019.03.13 Community Meeting there might be a need for a "draft" or "tmp" directory for active OCFL Objects.

zimeon · 2019-03-14T17:48:08Z

I wonder whether there might be a case for a normative (specification) addition that says some particular directory (maybe tmp) MAY be present but MUST be ignored. This would support validation of an object that is actively being worked on for all but the final creation of the new version (something along the lines of mv tmp vN; cp -p vN/inventory.json vN/inventory.json.sha512 .).

awoods · 2019-03-14T18:12:47Z

Sounds like a similar description for the current logs directory.

birkland · 2019-03-14T18:27:40Z

I toyed with the idea of placing the in-progress directory somewhere under the logs directory (e.g. logs/.v3), vs simply building up a content/v3 that is unreferenced in any inventory files until the very end when the work is "committed". I ended up doing the latter, justifying it by the notion that there is always a period of time when the object is in an inconsistent state when copying it into place, or adding a new version - so why not embrace it. The purview of the spec is objects at rest anyway, and building up a version is motion.

A temp directory (or some sort of convention, like directories under content that begin with a dot . MUST be ignored) does seem cleaner, though.

ahankinson · 2019-03-16T21:54:26Z

What about .vX (in-progress) vs vX (committed)? So .v3 while the content is being assembled, and then moved to v3 when it's finished.

zimeon · 2019-03-18T12:50:53Z

Something along the lines of _vX or vX_tmp would be fine but I don't see any advantage to hiding with a leading dot. Of course you are making the validation less tight by admitting directories with any X rather than one specific name

julianmorley · 2019-03-20T16:10:00Z

A temporary space on the storage root, not named /tmp, for the assembly of future OCFL object versions.

rosy1280 · 2019-03-20T16:13:35Z

we agreed to use the language in @zimeon second comment, except the directory should be named deposit and be placed at the storage root.

zimeon · 2019-03-21T12:34:03Z

Hmmm... as @rosy1280 just pointed out on my abortive PR #324, the #320 (comment) above suggests one deposit directory on the storage root. I'm not opposed to such a directory which might be useful for the assembly of new objects, but I do not think it is a good solution for the issue debated on this ticket, which is primarily updating objects with new versions.

Having one directory per storage root suddenly couples updates to different objects under the root in a potentially awkward way. It also doesn't provide a standard solution for within-object manipulations where they perhaps aren't following the storage root approach. (I do understand that in a filesystem implementation a whole root would likely be on one filesystem and thus move from a single deposit directory would still be efficient (relinking)).

birkland · 2019-03-21T13:04:36Z

A global deposit directory seems workable, but as Simeon implies each application would need to establish its own conventions to correlate activities in ${STORAGE_ROOT}/deposit with the relevant object(s) the updates will eventually go into. If multiple tools are involved in building up that new version, they would need to agree on the same convention. It would be more straightforward, in my opinion, to define an object-scope work directory to serve use cases related to updates. Keeping the proposed root-level directory for staging new objects is fine.

awoods · 2019-03-21T14:34:14Z

Although not detailed in the 2019.03.20-Editors-Meeting notes, the concern in that meeting with having a deposit directory within the Object Root came from the likely result of having draft content intermingled with the preservation resources.

zimeon · 2019-03-21T14:50:33Z

I think my answer to that concern is that it is optional to use a workspace within the object. We have at least one example (see #320 (comment)) of a choice to do this.

rosy1280 · 2019-03-21T17:33:04Z

@zimeon the benefit of the deposit directory at the storage root is that it would keep the object itself (and its versions) clean until a version can be moved out of the deposit directory. in the example you cite, we have to futz with the object to remove a version if processes stall mid way through creation. with the deposit directory at the storage root you don't.

as for how the deposit directory would work at a global level, you would need to create the hierarchy that you create for the regular storage root in the deposit directory. what i mean by that is if you're creating a pair tree hierarchy, then create the pair tree hierarchy for the object that you put in the deposit directory. if you have no hierarchy, then don't create a hierarchy.

as an fyi this is how Moab does deposits as well.

(note i edited this comment because now i see other comments that weren't appearing before)

ahankinson · 2019-03-21T18:01:53Z

Personally, my feeling is that the location of temporary space allocated for assembling new versions should be left up to the client to implement. There are too many use cases and edge cases for us to properly understand this.

Some implementers may be satisfied to use the /tmp directory; others may need larger assembly spaces. Some may need to do it on cloud storage, where they have no client-local object representation.

awoods · 2019-03-21T19:05:21Z

I agree, @ahankinson . From a validation perspective, however, we should include in the specification locations that should be ignored.

ahankinson · 2019-03-21T19:16:40Z

My first inclination would be to say that nothing is allowed in the Object Root, save for what we have specified. Any application-specific logic (and I would consider an 'in-flight' temporary directory application-specific) should not be stored with the content to be preserved.

ahankinson · 2019-03-21T19:19:06Z

I am less stuffy about the storage root, since we're looser on the validation part, but would agree with @zimeon about the relative perils of doing this.

zimeon · 2019-03-22T13:31:46Z

I don't think we should be prescriptive about how implementations do their manipulation of OCFL Objects alone or within an OCFL Storage Root. However, we should enable implementations to do it in the way they choose. I see three options and I advocate that all should be possible within spec:

Implementations use /tmp or equivalent, entirely outside of an OCFL Storage Root. No spec support required
Implementations use some named directory in an OCFL Storage Root, say deposit, to assemble new objects or updates. Spec would require explicit exclusion of this directory from validation processes (per Add to implementation notes a discussion of the idea of temporary space while building OCFL Objects #320 (comment))
Implementations use some named directory in an OCFL Object, say deposit, to assemble updates to the object. Spec would require explicit exclusion of this directory from validation processes (perhaps as outlined in my erroneous Add deposit directory #324). We already have one example of @birkland's implementation adopting an ad hoc solution to this in-object manipulation approach

ahankinson · 2019-03-22T13:37:16Z

@julianmorley raised the problem though, and I agree with him, that allowing incomplete or failed 'commits' within the "preservation" storage would seriously gum up the works in the long term. It's not just validation, it's also clarity of purpose -- OCFL Objects are "object at rest", not "object in motion." We've made the distinction quite clear by having Spec and Implementation Notes; I think we would be making a big step backwards if we were to start muddying it up this close to the finish line.

So I would be big thumbs-down to 3, and little thumbs-down to 2.

birkland · 2019-03-22T14:14:09Z

To be clear, my implementation currently just writes content directly into the next vN directory (with the intent to commit when it's done - it's not treated as a scratch space), and writes the inventory files as a last step. Cloud storage (like S3) doesn't have a rename operation anyway, so copying files directly into the vN directory (and coping somehow if individual operations fail) is really the only strategy available there. In that light, I'm fine if the spec doesn't define an object-level deposit directory.

In any case, I think best practices will emerge from experience.

I think the Fedora project might need to tweak or re-think their anticipated use cases for working with un-versioned content (where it is expected to change or otherwise be volatile before committed to a version), but that's neither here nor there.

In general, the possibility of failed or incomplete "commits" are unavoidable due to the fact that there is always some degree of motion in an object's lifetime as files fall into place (which can be mitigated to some extent by leveraging atomic renames), but it's proper for the spec as "object at rest" to be silent about that and just describe the expected states.

zimeon · 2019-03-26T16:56:33Z

It seems that the consensus is that we should not allow anything in the Object (no change to spec required) and allow a deposit directory in the Storage Root (change to spec, per original #320 (comment)) ... I'll make a new PR for that

rosy1280 · 2019-03-27T12:38:40Z

I also wonder if we should add something to the implementation notes discussing this topic.

zimeon added OCFL Object Implementation Needs Discussion labels Mar 14, 2019

rosy1280 added this to the Beta milestone Mar 20, 2019

zimeon self-assigned this Mar 20, 2019

zimeon removed the Needs Discussion label Mar 20, 2019

This was referenced Mar 21, 2019

Add deposit directory #324

Closed

Make it clear that the logs directory is called logs #325

Closed

zimeon added the Needs Discussion label Mar 21, 2019

zimeon mentioned this issue Mar 26, 2019

Add deposit directory in storage root #329

Merged

rosy1280 closed this as completed in #329 Apr 1, 2019

zimeon mentioned this issue Jun 5, 2019

Guidance on versioning implementation 1 #362

Merged

zimeon mentioned this issue Sep 18, 2019

Clarify whether and OCFL Object Root may contain other files/directories not specified #373

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add to implementation notes a discussion of the idea of temporary space while building OCFL Objects #320

Add to implementation notes a discussion of the idea of temporary space while building OCFL Objects #320

zimeon commented Mar 14, 2019

zimeon commented Mar 14, 2019

awoods commented Mar 14, 2019

birkland commented Mar 14, 2019

ahankinson commented Mar 16, 2019

zimeon commented Mar 18, 2019 •

edited

Loading

julianmorley commented Mar 20, 2019

rosy1280 commented Mar 20, 2019 •

edited by zimeon

Loading

zimeon commented Mar 21, 2019

birkland commented Mar 21, 2019

awoods commented Mar 21, 2019

zimeon commented Mar 21, 2019 •

edited

Loading

rosy1280 commented Mar 21, 2019 •

edited

Loading

ahankinson commented Mar 21, 2019

awoods commented Mar 21, 2019

ahankinson commented Mar 21, 2019

ahankinson commented Mar 21, 2019 •

edited

Loading

zimeon commented Mar 22, 2019

ahankinson commented Mar 22, 2019

birkland commented Mar 22, 2019

zimeon commented Mar 26, 2019

rosy1280 commented Mar 27, 2019

Add to implementation notes a discussion of the idea of temporary space while building OCFL Objects #320

Add to implementation notes a discussion of the idea of temporary space while building OCFL Objects #320

Comments

zimeon commented Mar 14, 2019

zimeon commented Mar 14, 2019

awoods commented Mar 14, 2019

birkland commented Mar 14, 2019

ahankinson commented Mar 16, 2019

zimeon commented Mar 18, 2019 • edited Loading

julianmorley commented Mar 20, 2019

rosy1280 commented Mar 20, 2019 • edited by zimeon Loading

zimeon commented Mar 21, 2019

birkland commented Mar 21, 2019

awoods commented Mar 21, 2019

zimeon commented Mar 21, 2019 • edited Loading

rosy1280 commented Mar 21, 2019 • edited Loading

ahankinson commented Mar 21, 2019

awoods commented Mar 21, 2019

ahankinson commented Mar 21, 2019

ahankinson commented Mar 21, 2019 • edited Loading

zimeon commented Mar 22, 2019

ahankinson commented Mar 22, 2019

birkland commented Mar 22, 2019

zimeon commented Mar 26, 2019

rosy1280 commented Mar 27, 2019

zimeon commented Mar 18, 2019 •

edited

Loading

rosy1280 commented Mar 20, 2019 •

edited by zimeon

Loading

zimeon commented Mar 21, 2019 •

edited

Loading

rosy1280 commented Mar 21, 2019 •

edited

Loading

ahankinson commented Mar 21, 2019 •

edited

Loading