Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sidecar file naming and format #520

Closed
pwinckles opened this issue Oct 26, 2020 · 6 comments
Closed

sidecar file naming and format #520

pwinckles opened this issue Oct 26, 2020 · 6 comments

Comments

@pwinckles
Copy link

pwinckles commented Oct 26, 2020

The specification for the inventory sidecar is awkward to use because you often want to verify the integrity of an inventory file before deserializing it, but this is complicated by the fact that the name of the sidecar file is dependent on the digest algorithm that's defined within the inventory file.

On filesystem implementations, this is annoying but not a big deal. You can just list the files and examine their names to identify the sidecar. However, the problem is more annoying for object store implementations to resolve.

It seems to me that the sidecar file specification was based on the BagIt manifest specification, but it does not seem like a good fit.

  1. With BagIt you can have multiple manifest files, using different algorithms, but with OCFL there may only be one and it must use the algorithm defined in the inventory.
  2. With BagIt the manifest lists the digests for many files, but with OCFL the sidecar only ever contains the digest of the inventory file.

The ship may have sailed on this one, but, to me, it makes more sense if the sidecar MUST be named inventory.json.sidecar (or whatever better name you can come up with), and have contents like ALGORITHM\tDIGEST. Where ALGORITHM MUST be the same as the algorithm that's defined in the inventory.

This would allow the sidecar to be easily located without needing to deserialize the inventory or root around looking for it.

[Edit] Reflecting on it more, I see that the format does align with how checksums are usually stored on unix systems. It's easy for a person to use manually. It's just more complicated to use programmatically.

@awoods
Copy link
Member

awoods commented Nov 10, 2020

With an interest in retaining compliance with the 1.0 specification, one approach could be to create an object or storage root extension that defines the digestAlgorithm used. This would facilitate direct access to any given inventory digest file.

@zimeon
Copy link
Contributor

zimeon commented Nov 11, 2020

It would certainly seem reasonable to have a storage root level statement (ie. an extension) that says "every object will use sha512 digests/sidecars" or at least "the latest version of every object will use sha512 digests/sidecars" -- this would essentially turn any occurrence of something else into a local error

@bcail
Copy link
Contributor

bcail commented Nov 11, 2020

having a storage root or object extension that defines the digestAlgorithm sounds fine (actually I might lean toward a storage root extension - not sure it makes much sense have an object extension that you load to find out the algorithm that you can find in other places in the object)

@awoods awoods added this to the 2.0 milestone Nov 11, 2020
@awoods
Copy link
Member

awoods commented Nov 11, 2020

I am marking this as a 2.0 issue, with the 1.0 recommendation of defining an extension that details which algorithm is used in order to directly know the name of the inventory file.

@rosy1280
Copy link
Contributor

@pwinckles is this something that you still need addressed, or are you happy with the way things are.

@rosy1280 rosy1280 removed this from the 2.0 milestone Sep 22, 2023
@pwinckles
Copy link
Author

@rosy1280 you can close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants