Defining a repository from peer storage roots #43

marcolarosa · 2022-03-09T23:34:36Z

[Moved from the spec issues repository as this describes a new use case of handling multiple storage roots making up one repository. It includes both the aggregation of content in multiple storage roots and possibly replication of content.]

This may be a part of issue OCFL/spec#22 and it certainly follows on from the comment.

My institution can't provide a single 200TB volume (!). But they can give me 2 x 70TB and a 60TB volume. So for my use case I now need to have 3 OCFL filesystems that I interact with as a single unit from my service.

Given this, it would be nice to be able to define metadata at the repository level that says this filesystem is a part of a larger set of peers. Nice to haves would include defining a priority for each peer and perhaps the storage tier. That way, clients can make smart decisions about ranking peers by tier and then priority (I imagine these are properties defined by the administrators provisioning the storage).

The justification for this is that any connecting service or user inspecting the filesystem can identify that it is part of a larger set.

For example - a storage.json or some such with content like:

{
  peers: [
    { 
       type: 'filesystem',
       mountpoint: /mnt/ocfl-repo1
       priority: 1,
       tier:  'hot'
    },
    { 
       type: 'filesystem',
       mountpoint: /mnt/ocfl-repo2
       priority: 2,
       tier:  'cold'
    },
    { 
       type: 's3'
       endpointUrl: undefined (means aws S3) or URL (means something like a local minio instance),
       forcePathStyle: true, false or undefined (=false) (required for minio),
       priority: 2,
       tier:  'warm'
    },
    { 
       type: 'filesystem',
       mountpoint: /mnt/ocfl-repo3,
       priority: 1,
       tier:  'hot'
    },
  ]
}

In this model priority can be any sequential number and class could be 'hot', 'warm', 'cold' to dovetail with typical nomenclature used in the industry.

The text was updated successfully, but these errors were encountered:

zimeon · 2022-03-10T19:25:38Z

IMO this is distinct from OCFL/spec#22.

I think the idea of having a way to describe that a storage root contains partial content for a "repository" that is spread across multiple storage roots, or that one or more replica copies of a storage root exist, is interesting. I think there are perhaps different requirements for these two use cases. For example, the notions of priority and tier seem relevant only for the replica use case (where one might select which to access based on the values). The other thing I wonder about is whether this is best express inside a storage root (perhaps as an extension) or would be defined at some as yet undefined higher level of configuration/assembly.

rosy1280 · 2023-10-27T19:03:51Z

Feedback on Use Cases

In advance of version 2 of the OCFL, we are soliciting feedback on use cases. Please feel free to add your thoughts on this use case via the comments.

Polling on Use Cases

In addition to reviewing comments, we are doing an informal poll for each use case that has been tagged as Proposed: In Scope for version 2. You can contribute to the poll for this use case by reacting to this comment. The following reactions are supported:

In favor of the use case	Against the use case	Neutral on the use case
👍🏼	👎🏼	👀

The poll will remain open through the end of February 2024.

bbpennel · 2023-10-30T13:58:27Z

I understand needing data to be distributed across many storage locations/options, but I'm not sure I totally understand why OCFL needs to be aware of this. It would be helpful to hear more about what is gained by having all the storage roots in one OCFL repository, versus having an application layer above OCFL be aware of multiple repositories. Would the OCFL specification be moving towards handling additional functions like replication, tiering and load balancing, or is it primarily for ease of discovery by a client without needing to keep track of multiple repositories?

srerickson · 2023-10-30T17:32:50Z

I agree with @bbpennel -- this feels to me like functionality that doesn't need to be part of the core spec. Perhaps there is a reason this can't be implemented as an extension, but I don't see it.

marcolarosa · 2023-11-01T03:33:42Z

Wow - this is a blast from the past!

I've long since moved on from that project but we decided quite a while ago to stop using OCFL altogether. The complexity of the spec and the compromises we were required to accept just didn't stack up. I don't know if the project will reconsider OCFL in the future but I do know it won't be using the architecture described in this ticket (which we weren't happy about in any case) so I think this can be canned.

zimeon · 2023-11-08T06:45:13Z

I agree with other comments that this should not be part of the core OCFL specification. I think we would need to see experiments combining individually valid OCFL Storage Roots to explore what would be needed at the core level and could not be implemented through a separate higher-level specification

zimeon · 2024-02-29T17:31:33Z

2024-02-29 Editors' agree that we will close as out of scope. Comments do not support inclusion in the spec and the original institutional use case no longer applies. Voting at time of closing is -2.

marcolarosa changed the title ~~Definining peer repositories~~ Defining peer repositories Mar 10, 2022

rosy1280 transferred this issue from OCFL/spec Sep 22, 2023

rosy1280 mentioned this issue Sep 22, 2023

Distributed Storage of objects and components #39

Closed

zimeon changed the title ~~Defining peer repositories~~ Defining a repository from peer storage roots Sep 22, 2023

zimeon added the Proposed: In-Scope Use case is up for discussion and may change the spec, implementation notes, or become an extension. label Sep 22, 2023

rosy1280 added the Component: Specification label Sep 22, 2023

rosy1280 mentioned this issue Sep 22, 2023

Decouple storage from OCFL Object OCFL/spec#22

Closed

zimeon added Confirmed: Out-of-scope Use case will not be included in the upcoming version of the spec or implementation notes. and removed Proposed: In-Scope Use case is up for discussion and may change the spec, implementation notes, or become an extension. labels Feb 29, 2024

zimeon closed this as completed Feb 29, 2024

zimeon closed this as not planned Won't fix, can't repro, duplicate, stale Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Defining a repository from peer storage roots #43

Defining a repository from peer storage roots #43

marcolarosa commented Mar 9, 2022 •

edited by zimeon

Loading

zimeon commented Mar 10, 2022

rosy1280 commented Oct 27, 2023

bbpennel commented Oct 30, 2023

srerickson commented Oct 30, 2023

marcolarosa commented Nov 1, 2023

zimeon commented Nov 8, 2023

zimeon commented Feb 29, 2024

Defining a repository from peer storage roots #43

Defining a repository from peer storage roots #43

Comments

marcolarosa commented Mar 9, 2022 • edited by zimeon Loading

zimeon commented Mar 10, 2022

rosy1280 commented Oct 27, 2023

Feedback on Use Cases

Polling on Use Cases

bbpennel commented Oct 30, 2023

srerickson commented Oct 30, 2023

marcolarosa commented Nov 1, 2023

zimeon commented Nov 8, 2023

zimeon commented Feb 29, 2024

marcolarosa commented Mar 9, 2022 •

edited by zimeon

Loading