-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Filesystem Interoperability Notes #227
Conversation
First pass Refs #212
Follows BagIt's model
Any thoughts? |
integrating files from filesystems that use other encodings. | ||
</li> | ||
<li> | ||
Some filesystems are not <strong>case sensitive</strong>, meaning two file names that differ only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a normative MUST
, SHOULD
, or MAY
in this last item?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a significant question. We either allow a degree of incompatibility or we are rather restrictive. Neither is very appealing. Bagit takes the first approach but gives a description of the issue: https://tools.ietf.org/html/draft-kunze-bagit-17#section-6.1.1.1
so I think I would tend toward that approach. A link to BagIt might be nice here but essentially this is the tack @ahankinson has taken in the PR
Access Control Lists or Hidden files. | ||
</li> | ||
<li> | ||
The <strong>character encoding</strong> of the filesystem and Inventory SHOULD be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what it means to talk about character encodings of a filesytem -- I think our expectation here is bytestream fidelity, without that all is lost! I would thus rephrase this paragraph to talk only about inventory files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use 'Unicode-compatible' as the language.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have filenames that are encoded in a non-unicode compatible way, some transformations will be needed.
</li> | ||
<li> | ||
The <strong>character encoding</strong> of the filesystem and Inventory SHOULD be | ||
Unicode-compatible, either UTF-8, UTF-16, or UCS-2. Implementers may experience problems |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JSON itself is defined over UTF-8, UTF-16 or UTF-32 (with either byte order for UTF-16 or UTF-32), see https://tools.ietf.org/html/rfc4627#section-3 . I do not know whether all JSON parsers have good support for all of these encodings. There is no note about UCS-2 encoding so I'm not sure how we imagine that to be handled? I feel a little out of my comfort zone with this question but I think we need to be more explicit and tie to the JSON spec. The current text seems to raise more questions than it answers in my mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I'm new here but wanted to add that it may be helpful to be more specific about Unicode Normalization, particularly given the difficulties encountered by folks working on BagIt. (See here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example @srerickson raises seems to be when one is which the user was using BagIt on a local system (or perhaps the now defunct Apple Server) which generated the issue. Since OCFL, at least in my mind, would be used by systems not people packaging things up, I can't imagine a situation where this would happen. Are there other cases or systems where this might happen? Its been a while since someone let me on a server so I admit to being ignorant...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rosy1280 I think the edge case to consider is when a repository is rebuilt on a file system that handles filenames differently than the filesystem the OCFL Object was created on. In that situation (as in the example with BagIt) it might be possible for the filenames in the inventory to differ from the actual filenames (even though they both look the same, visually).
or "colon" (':') as a path delimiter. | ||
</li> | ||
<li> | ||
<strong>File permissions</strong> MAY be applied to files in an OCFL Object; however, implementers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File permissions are unavoidably applied to any file in all current filesystems I know about so the MAY here seems odd. I also don't think we should introduce fuzzy terms like ACLs and hidden files. I think we should make a simpler statement along the lines of:
File permissions are not portable across filesystems and are not expected to be preserved by OCFL clients.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that @neilsjefferies has language similar to this and I remember a discussion in the September F2F meeting along these lines. So 👍 to @zimeon 's suggestion
integrating files from filesystems that use other encodings. | ||
</li> | ||
<li> | ||
Some filesystems are not <strong>case sensitive</strong>, meaning two file names that differ only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a significant question. We either allow a degree of incompatibility or we are rather restrictive. Neither is very appealing. Bagit takes the first approach but gives a description of the issue: https://tools.ietf.org/html/draft-kunze-bagit-17#section-6.1.1.1
so I think I would tend toward that approach. A link to BagIt might be nice here but essentially this is the tack @ahankinson has taken in the PR
I wonder whether some of these questions should be elevated to issues for discussion? |
Windows is a real pain here. Under the hood, NTFS allows / and \ to be interchangeable directory separators, and it is also case sensitive and Unicode supporting. However, many of its user space tools are differ since they also support FAT variants with their variable handling of these aspects. |
Actually, in the light of this (per folder case sensitivity) https://www.windowscentral.com/how-enable-ntfs-treat-folders-case-sensitive-windows-10 ...Can we just tell NTFS to go forth and multiply....? |
I agree with @zimeon if we didn't talk about this in the last Editor's Meeting we should talk about it in the next one. |
Editors call: 05/12/2018 After discussion we'll close this and break specific discussions out to other tickets / PRs. |
A first pass at filesystem interoperability considerations.
Open for comment and review, not for merging (yet)