-
Notifications
You must be signed in to change notification settings - Fork 14
2023.09.13 Community Meeting
- Wednesday September 13th 4pm GMT / 11am EDT / 8am PDT
- Thursday September 14th 1am AEST
- Convert to your time zone
Zoom Link: https://emory.zoom.us/j/7074635164?pwd=SExsZ1NwYjVlNy9ZWHJHZ09BYXVxQT09
- John Weise (U Michigan)
- Jürgen Enge (Universität Basel)
- Jared Whilko (University of Manitoba)
- Seth Erickson (UCSB)
- Simeon Warner (Cornell)
- Doron Shalvi (US NLM)
- Laurie Arp (Lyrasis)
- Tommy Keswick (CalTech)
- Robert Doiel (Caltch Library)
- Rosalyn Metz (Emory)
- Nicole Scalessa (Vassar College)
- David Novak
John - Have not yet implemented OCFL but very interested
Jared - Moving from Fedora 3 to Fedora 6, have also started discussion about whether ArchiveMatica will implement OCFL
Seth - Have implementation
Laurie - Looking from Lyrasis perspective
Juergen - Build library in go. Now working to move to OCFL archive in 2-3 years
Doron - Doing a Fedora 3 to Fedora 6 migration
Tommy - Have implemented some Invenio RDM repos. Working on a new preservation and interested in aligning with the community.
Simeon - OCFL editor. Looking to move Cornell preservation from home grown to all-cloud OCFL implementation.
Robert - Working with Doron, have connected and discussion with Neil Jeffries about work on a JSON data store with attachments.
Nicole - Using Islandora reposityory over Fedroa 6 and using that as preservation system
Rosalyn - OCFL editor.
2.1 How are things going with implementing and conforming to version 1? Are there specific use cases that you feel need to be addressed or clarified?
Juergen - Have problems with using OCFL Objects that have many many files. Would like to use one zip file per version (https://github.com/OCFL/Use-Cases/issues/33). Currently doing a zip per object, up to 300GB
Juergen - Extension issues: how do we get rid of extension warnings. Need either to have a better process to get extensions registered or to not require warnings to be generated per the spec.
2.2 How do you imagine your storage needs evolving over the next decade; what are you concerned about? How might OCFL help address these issues?
At NLM has been testing with small repository. Are also looking at design issues for a large repository and interested in pointers for how to deal with large systems. Some video master files are very large (maybe 100's GB, perhaps even a few TB). In Fedora 3 these large files are managed externally to Fedora in own human readable file structure -- seems like a good map to OCFL. How do we navigate the creation of multiple files and directories in OCFL?
NLM decision to manage files externally for Fedora 3 was to avoid sending files through the Fedora API. This feels like a good decision and have been happy to have Fedora aware of these files but not directly as Fedora objects. Felt that this was a better preservation posture. Like the idea of managing the master copies separate from the master copies.
NLM goal is better preservation with transparency and good management. Organization has moved to AWS as well as on-prem. Plan to use multiple copies strategy. Look to OCFL as future organization for masters. Secondary goal might be connection to the Fedora 3 to Fedora 6 migration. Wondering about whether having 2 OCFL Storage roots makes sense.
2.3 What issues are you concerned about when it comes to versioning your data? How might OCFL help address these issues?
Current NLM structure with small repo is one XML blob per citation. Found out that there were many files and directories in OCFL and had to work with that. Not using versioning in the small system. With larger systems imagine more issues, want to use versioning, and may have to think about optimizations.
Rosalyn notes that there have been past requests for the ability to fork versions.
Doron notes that mutable head covers many use cases. They have situations where they have lots of updates in a very short space of time, would not want to create separate versions but think of mutable head.
Juergen wonders about file deltas for new versions, instead of having to copy whole file again. This isn't an issue for him directly (modest text files) but imagines uses that might benefit significantly from this (e.g. large research data).
Should this be beyond the scope of OCFL?
Indiana also thinking about this?
Does it even make sense to have OCFL "manage" external files in the way that Fedora 3 does?
Past work: https://github.com/OCFL/Use-Cases/issues/35
Doron/NLM - Current size is about 150TB but imagine migth have more. Interested in the possibility of not having to move content, or what about reference to something that exists on a tape system? Note sure whet
Jared - Curious about expectations and preservation needs for an OCFL object that references external content.
Juergen - OCFL could do something like put a virtual filesystem inside and OCFL object. Then there could be any type of link
John - I can imagine the vague edge cases too, but I’m struggling to think of a situation where at U Michigan Library we would want to have external references.
Seth - It might make sense to have a storage root-level “content” directory that is shared by objects in the root, but it seems counter productive to link to files outside the storage root.
Jared - From Peter Winkles work of OCFL-Java there are things that have to be dealt with to work with both filesystems and S3 stores. Is there a way to use the preservation tools of cloud systems and improve how files are uploaded and retrieved.
Tommy - When storing in S3 AWS, there is an etag of things you store. Checksums in past work with bags of bags got really messy if you don't want to transfer the files again. Would one want also to store AWS etags somewhere in OCFL?
Seth - Would like to see better support for the graceful creation of new versions. Process typically means staging content somewhere, stage it, write new structure and move/copy files in. Would be nice to have better spec support for graceful creation of new version with non-error condition for partially complete. A solution would be to allow content to be assembled in a new version perhaps not be considered part of the object until the new inventory is written. Part of this is aboout what operations are more or less atomic on objects stored vs filesystems.
The mutable head extension is somewhat different, described at: https://ocfl.github.io/extensions/0005-mutable-head.html
Spec is currently unclear in description of the specification version of the object and the specification of the a version within the object. There could, for example, be a NAMASTE file in the version directory.
Seth - Is it necessary to allow an upper and lowercase in a digest. Could each have a normalized form?
- Wednesday October 11th 7pm ET / 4pm PT / 11pm GMT | Thursday October 12th 10am AEST Convert to your time zone
- action items here