-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify container listing mechanism #116
Comments
Concrete example: http://alice.pod/share/photoAnnotations/<http://alice.pod/share/photoAnnotations/> a ldp:BasicContainer ;
ldp:contains <annot123>, <annot456> .
# data cached from contained resources
<annot123> a photo:Annotation ;
dc:author "Alice" ;
photo:photo <http://amy.pod/photos/photo123.jpg>,
photo:caption "Alice and Amy at work" .
<annot456> a photo:Annotation ;
dc:author "Alice" ;
photo:photo <http://bob.pod/photos/photo456.jpg>,
photo:caption "Bob sleeping at work" . Timeline:
<annot456> a photo:Annotation ;
dc:author "Alice" ;
photo:photo <http://bob.pod/photos/photo456.jpg>,
photo:caption "Bob sleeping at work" . |
In the context of access controls, there are generally two cases to consider pertaining to the binding relationship between a LDP resource and the container it is in:
Aside: AFAIK, when we discussed WAC/ACL, we've generally used To that end, and FWIW, I've created solid/authorization-panel#55 and solid/data-interoperability-panel#31 to properly explore the ODRL Information Model (but it could also be something else) and to see how it can work with WAC/ACL. What this may entail is having default (pod) policies to express a layer of rules and conditions when interacting with resources. For container listing, I propose that we adopt a privacy first approach as a default. That is, agent with Read access to container can view all containment triples with the exception of contained resources that they do not have Read access to. Going forward, as soon as we can express that declaratively eg. how ACL+ODRL works in close proximity to LDP, we can specify the behaviour (as MUST?) but still allow other possible pod configurations (as MAY?) eg. an agent's Read access or lack of is not part of the policy in constructing containment triples. |
something along the lines of -> <http://alice.pod/share/policy>
a odrl:Policy ;
odrl:conflict odrl:perm ;
odrl:prohibition [
odrl:target <http://alice.pod/share/photoAnnotations/> ;
odrl:action odrl:use
] ;
odrl:permission [
odrl:target <http://bob.pod/photos/photo456.jpg> ;
odrl:action acl:Read;
odrl:assignee :Alice, :Bob
] . ? |
@timbl explained to me in a F2F discussion that he considered putting details like in the above examples to be a bad practice, you shouldn't do that. Instead, he had suggested using a different resource for that purpose. The databrowser uses The way that I now understand it is that should not be interpreted as a default resource for the container (#69), it is not a representation of the container, merely an easily accessible aggregate of certain data from the contained resources. He suggested that even though he had some regrets about calling it Merely moving these data to a different resource doesn't resolve all concerns in this resource, as access controls are more granular for resources in a container than for the container itself, which I understand motivates the desire to list only resources that a client is authorized to read. This is a general departure from the current Solid, though, as the ACL system currently has a resource as the smallest unit. It is also a departure from the UNIX filesystem analogy. This suggests to me that a best practice recommendation is to not put things in the identifier someone with read access to the container shouldn't see. Other than that, I think we should stay close to the UNIX filesystem analogy, i.e., the container has containment triples, required metadata, optional timestamps, etc, but not actual user data. |
Proposal: Computed ContainersI haven't implemented this yet but I think it would work to not store containment triples in There could be a system-managed SemanticsGET -- Motivation
|
Yeah, I think that the hierarchy assumption makes it necessary to computer container membership on request. However, as per the discussion I had with Tim as referenced above,
By "augmentation", I mean examples such as <annot456> a photo:Annotation ;
dc:author "Alice" ;
photo:photo <http://bob.pod/photos/photo456.jpg>,
photo:caption "Bob sleeping at work" . From the conversation we had, it seems Tim's answer to the two first question are "no", but that the answer to 5 is yes. As for whether the containment triples should be listed, that's a trickier question. I think it comes down to whether the URI is sensitive in itself. On one hand, it can be sensitive in some sense, clearly, a WebID is something that identifies a person. OTOH, we should be careful to imply that URIs can always be protected, because if we communicate that URIs are protected, what does that have to say for a lot of other security questions we are treating? My hunch is that we could end up in a situation where we rely on security by obscurity if that's the model we go for, but I'm not too well versed in this. Instead, it might be better to communicate clearly that you shouldn't put sensitive information in URIs, ever. A pragmatic issue is also that if Like Tim, I believe that augmented descriptions should not be a part of the container representation, instead it should go in different resources. I'm thinking that we should |
I agree that index.ttl should be discoverable. I also think that the use cases for index.ttl can be covered by |
Noooo, I don't think so... I think that the client need to recognize the semantics, and so it should be speced, at least as a best practice. Also, I think the semantics is more like "you should see this resource too, it has more info", which is quite different from "this resource has metadata about it here". My opinions aren't as strong as they often are about stuff, though :-) |
Perhaps start by stating the the actual use cases before proposing a solution. |
Yeah, the mechanisms around data augmentation is a case that would better be use case driven. But that also means that we should be careful not to put too much into the container representation. We do have to think carefully about the status of URIs, if they are considered to be sensitive in themselves, and I think that's a fairly urgent issue. Perhaps we should open a separate issue on that topic, though? |
I opened solid/solid#142 to discuss whether we should make a different assumption than RFC7231 on the sensitivity of URIs, lets discuss that topic over there. |
Just to retain focus: regarding using index.ttl to address "data augmentation" needs its own issue. It doesn't mean that the name "index.ttl" (as fixed/reserved/well-known naming) will be used and/or a particular property to discover it. That's part of understanding what the actual requirement is any way. Would you mind creating it? 142 can broadly help with this issue (116) but I don't see the specific relevance of index.ttl ("data augmentation") here. |
Yes, good idea, I opened solid/solid#144 to address that. So, what's open to discuss here is the implications of consensus in solid/solid#142. Then, there's the discussion of linking ACL and other metadata resources. I suggest that is best dealt with in the general metadata resource discussion. The use of Anything else? |
Given: resource-based authorization (like WAC/ACL) holds that agents with Read access privilege can read a container's description in full; including containment statements. Noting the performance consideration as brought up in #142 (comment) , the above behaviour does not require additional machinery and so can generate container representations with minimal effort. Noting also the alignment with common *nix directory listing behaviour as mentioned above. And, noting that resource naming - whether a URI path discloses any sensitive information - to be orthogonal to this issue. I propose that the spec remains consistent when resource-centric access control is used. When supplemental access control policies eg. attribute-based, as mentioned in #116 (comment) are put in place - possibly even extending or combined with WAC/ACL - they can allow agents to set fine-grained policies. The same mechanism can potentially allow users to hide resources from container listing by setting required policy parameters. Currently the spec does not have a requirement for container representations to include anything beyond containment triples. I agree to revisit Prefer-based listing separately - possibly as optional behaviour but indeed would be use case driven. Is there a significant use case that would need to have a way for a container representation to include information from its or possibly its containments' auxiliary resources? |
I don't think it's quite an orthogonal issue, it's the main issue here. To put it simply, showing links to resources to users who don't have read access to those resources, is a really disturbing privacy and usability implication. And while I don't have results of formal usability studies (though hopefully they are out there), I would argue that this behavior would go directly against existing user mental models and intuitions. In vast majority of current web-based files & folders storage systems (such as Google Drive, Dropbox, etc), if you don't have read access to a document or file, you don't see it if you view the contents of the folder. I think it would be incredibly dangerous to do otherwise, as your'e proposing.
Aha! Now this is thinking along the right lines! Except we should invert that default. By default, if somebody doesn't have Read access to a resource, they don't see it in container listings. But that can be overridden, on a policy level, and you can set some sort of ACL directive that says "show this resource to users who don't have access" (presumably, so they can request access to it, or just to taunt them, etc.) |
Here "sensitivity" is in context of information in URI as opposed to access control on contained resources, and so it is not applicable to the listing mechanism practically speaking: 1) there is no requirement in Solid that exposes sensitive information in URIs ( see summary in #142 (comment) ) and 2) we wouldn't be able to test anything above and beyond the considerations mentioned in RFCs. Let me throw in an example coming from a different direction on sensitivity: an agent knowing the existence of What's proposed - more like clarified - earlier is intended to remain consistent with the foundations and good practices that we have some rough consensus on. I view this as the default ie. containment listing is not affected by resource-centric access control, at least within the scope of current WAC/ACL. I think we should first acknowledge that in order to pave the way to extensions and fine-grained policies. Needless to say, ACL Read's definition is confined to operations related to accessing a resource and reading its contents. Visibility of resource names is currently not influenced through WAC/ACL. So, we should take care to not subscribe behaviour that's not originally there. We have to determine which access control mechanism or paradigm can take this on. We could of course introduce this through WAC/ACL a simple statement or extend/combine with other models - mentioned ODRL above as example, and the default can be overridden there. It can also be inherited from root container if say storage owner/controller sets it. In addition to the default that I've mentioned, we can also recommend that servers SHOULD or MAY want to control containment listing based on authenticated agent's access control on contained resources, and take note of performance considerations (among other things). As you know, Apache's directory index by default behaves like the *nix system in that files under a directory are visible even while agents don't have read access on each file. Granted actual applications (more like centralised services coupling UI and data) work on some layers above that, so what you highlight about some mental models definitely holds true and still possible. |
I have implementation experience with both modes described here: (A) in which a container resource will list all child resources, regardless of the client's access to those child resources and (To be clear, I'm assuming that the client has at least acl:Read access to the container resource itself) The TL;DR version is that, at scale, (B) degrades very quickly and very badly. The WebACL enforcement algorithm is already not especially fast, and if a server needs to perform N+1 ACL checks for every container read request (where N is the number of child resources), then container listings becomes very slow. On a piece of software that I previously worked on that implements approach (B), I heard stories about container GET requests timing out after ten minutes (!) -- this is for containers with ~10,000 child resoruces. This also means that it is not possible to cache container resource responses, since every response will depend on exactly who is making the request. Option (A) tends to have very good performance characteristics and a much simpler implementation. It is also cacheable. The major downside, as @dmitrizagidulin points out, is that one cannot hide URLs. If that is your concern, however, there is an easy way to deal with that: add a layer of indirection. If you have a publicly readable container at
The A very different way of thinking about @dmitrizagidulin's use case is in the context of a query interface, especially one with an ability to page through responses. So long as the mechanism for interacting with containers is via a RESTful interface (i.e. resource-oriented), I have a hard time supporting option (B). But if we view this through the lens of a query interface, one would typically constrain the result set through a paged collection of responses. There, the constrained set of results makes it possible to do all sorts of response filtering, based on access controls. And the expectations about RESTful, resource-based orientation are entirely different. In other words, I find it problematic to require filtering containment triples of containers, based on access controls, but I think it is perfectly reasonable to do that very thing in a query context (provided that one is able to page through the result set) |
I suggest to include the following text to tighten up the spec in the spirit of what's discussed. Preventing information leakage in context of HTTP responses to successful resource creation: " When using Web Access Control, an Above is not applicable to Edit: Updated guideline as normative. |
Nothing to do here (for now). Move along. |
acl:Read
) the resource? ( See also Current Container Listing mechanism does not support hidden resource use case #626 )Prefer
include=...all
analogous tols -a
)?The text was updated successfully, but these errors were encountered: