Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defining a repository from peer storage roots #43

Closed
marcolarosa opened this issue Mar 9, 2022 · 7 comments
Closed

Defining a repository from peer storage roots #43

marcolarosa opened this issue Mar 9, 2022 · 7 comments
Labels
Component: Specification Confirmed: Out-of-scope Use case will not be included in the upcoming version of the spec or implementation notes.

Comments

@marcolarosa
Copy link

marcolarosa commented Mar 9, 2022

[Moved from the spec issues repository as this describes a new use case of handling multiple storage roots making up one repository. It includes both the aggregation of content in multiple storage roots and possibly replication of content.]

This may be a part of issue OCFL/spec#22 and it certainly follows on from the comment.

My institution can't provide a single 200TB volume (!). But they can give me 2 x 70TB and a 60TB volume. So for my use case I now need to have 3 OCFL filesystems that I interact with as a single unit from my service.

Given this, it would be nice to be able to define metadata at the repository level that says this filesystem is a part of a larger set of peers. Nice to haves would include defining a priority for each peer and perhaps the storage tier. That way, clients can make smart decisions about ranking peers by tier and then priority (I imagine these are properties defined by the administrators provisioning the storage).

The justification for this is that any connecting service or user inspecting the filesystem can identify that it is part of a larger set.

For example - a storage.json or some such with content like:

{
  peers: [
    { 
       type: 'filesystem',
       mountpoint: /mnt/ocfl-repo1
       priority: 1,
       tier:  'hot'
    },
    { 
       type: 'filesystem',
       mountpoint: /mnt/ocfl-repo2
       priority: 2,
       tier:  'cold'
    },
    { 
       type: 's3'
       endpointUrl: undefined (means aws S3) or URL (means something like a local minio instance),
       forcePathStyle: true, false or undefined (=false) (required for minio),
       priority: 2,
       tier:  'warm'
    },
    { 
       type: 'filesystem',
       mountpoint: /mnt/ocfl-repo3,
       priority: 1,
       tier:  'hot'
    },
  ]
}

In this model priority can be any sequential number and class could be 'hot', 'warm', 'cold' to dovetail with typical nomenclature used in the industry.

@zimeon
Copy link
Contributor

zimeon commented Mar 10, 2022

IMO this is distinct from OCFL/spec#22.

I think the idea of having a way to describe that a storage root contains partial content for a "repository" that is spread across multiple storage roots, or that one or more replica copies of a storage root exist, is interesting. I think there are perhaps different requirements for these two use cases. For example, the notions of priority and tier seem relevant only for the replica use case (where one might select which to access based on the values). The other thing I wonder about is whether this is best express inside a storage root (perhaps as an extension) or would be defined at some as yet undefined higher level of configuration/assembly.

@marcolarosa marcolarosa changed the title Definining peer repositories Defining peer repositories Mar 10, 2022
@rosy1280 rosy1280 transferred this issue from OCFL/spec Sep 22, 2023
@zimeon zimeon changed the title Defining peer repositories Defining a repository from peer storage roots Sep 22, 2023
@zimeon zimeon added the Proposed: In-Scope Use case is up for discussion and may change the spec, implementation notes, or become an extension. label Sep 22, 2023
@rosy1280
Copy link
Contributor

Feedback on Use Cases

In advance of version 2 of the OCFL, we are soliciting feedback on use cases. Please feel free to add your thoughts on this use case via the comments.

Polling on Use Cases

In addition to reviewing comments, we are doing an informal poll for each use case that has been tagged as Proposed: In Scope for version 2. You can contribute to the poll for this use case by reacting to this comment. The following reactions are supported:

In favor of the use case Against the use case Neutral on the use case
👍🏼 👎🏼 👀

The poll will remain open through the end of February 2024.

@bbpennel
Copy link

I understand needing data to be distributed across many storage locations/options, but I'm not sure I totally understand why OCFL needs to be aware of this. It would be helpful to hear more about what is gained by having all the storage roots in one OCFL repository, versus having an application layer above OCFL be aware of multiple repositories. Would the OCFL specification be moving towards handling additional functions like replication, tiering and load balancing, or is it primarily for ease of discovery by a client without needing to keep track of multiple repositories?

@srerickson
Copy link

I agree with @bbpennel -- this feels to me like functionality that doesn't need to be part of the core spec. Perhaps there is a reason this can't be implemented as an extension, but I don't see it.

@marcolarosa
Copy link
Author

Wow - this is a blast from the past!

I've long since moved on from that project but we decided quite a while ago to stop using OCFL altogether. The complexity of the spec and the compromises we were required to accept just didn't stack up. I don't know if the project will reconsider OCFL in the future but I do know it won't be using the architecture described in this ticket (which we weren't happy about in any case) so I think this can be canned.

@zimeon
Copy link
Contributor

zimeon commented Nov 8, 2023

I agree with other comments that this should not be part of the core OCFL specification. I think we would need to see experiments combining individually valid OCFL Storage Roots to explore what would be needed at the core level and could not be implemented through a separate higher-level specification

@zimeon zimeon added Confirmed: Out-of-scope Use case will not be included in the upcoming version of the spec or implementation notes. and removed Proposed: In-Scope Use case is up for discussion and may change the spec, implementation notes, or become an extension. labels Feb 29, 2024
@zimeon
Copy link
Contributor

zimeon commented Feb 29, 2024

2024-02-29 Editors' agree that we will close as out of scope. Comments do not support inclusion in the spec and the original institutional use case no longer applies. Voting at time of closing is -2.

@zimeon zimeon closed this as completed Feb 29, 2024
@zimeon zimeon closed this as not planned Won't fix, can't repro, duplicate, stale Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Specification Confirmed: Out-of-scope Use case will not be included in the upcoming version of the spec or implementation notes.
Projects
None yet
Development

No branches or pull requests

5 participants