Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JCL-417: Validate LDP containment data in SolidContainer::getResources #571

Merged
merged 3 commits into from
Jul 6, 2023

Conversation

acoburn
Copy link
Collaborator

@acoburn acoburn commented Jul 3, 2023

At present, the SolidContainer::getResources method includes references to every resource that fits the pattern:

?container ldp:contains ?child

Where ?container is the target container resource. This means that a non-compliant server could return a set of (apparent) child resources that are not really part of the Solid containment hierarchy. For applications that are built with the Solid specification in mind, it could lead to unexpected behavior.

@acoburn acoburn requested a review from a team as a code owner July 3, 2023 15:55
}).collect(Collectors.collectingAndThen(Collectors.toSet(), Collections::unmodifiableSet));
final String container = getIdentifier().toString();
// As defined by the Solid Protocol, containers always end with a slash.
if (container.endsWith("/")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could have a dedicated error thrown here and upon creation too, if the URI does not end in a slash.
And while this is a precondition, I find it in general strange to be checked here.

Copy link
Collaborator Author

@acoburn acoburn Jul 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not be inclined to throw an exception here. Generally, I'd prefer the client libraries to be generous in the data they accept. In this particular case, if the resource is a SolidContainer, then the getResources() method must follow the rules of Solid, but if the resource isn't strictly a SolidContainer, it shouldn't cause the consuming application to throw an exception (and thus the getResources method would just return an empty collection).

This condition is specifically to prevent against such cases as this:

A resource is https://example.com/resource and it includes the triple:

<> ldp:contains <https://example.com/resourcechild> .

That triple should not cause the getResources method to include the sibling resource from being included in that list (since it doesn't conform to the Solid definition of containment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to note is that these checks are put in place to prevent a malicious or misbehaving server from causing an application to perform unexpected operations on non-contained resources. For well-behaved servers, this will never be an issue and there is no need to add these constraints, but we can't expect all servers to be well-behaved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, your last comment was the reason for which we decided to throw an error in the JS implementation of this feature: it may be desirable to let the consuming app know that there is something wrong with the target resource so that it takes action and removes the offending triple, so that other clients who don't implement this check don't perform the unintended operations we are preventing against here. I would prefer the DX to be convergent between JS and Java, so I'm happy to filter out the resources that do not follow slash semantics from the children list, but isn't there value in communicating this inconsistency to the app/user, especially when they can act on it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My rationale is basically the "robustness principle" https://en.wikipedia.org/wiki/Robustness_principle

Be conservative in what you do and liberal in what you accept from others.

There may be cases where an RDFSource is used to model an example Container's RDF or where an app is intending to write certain triples or where a server is simply doing things incorrectly. There will be a lot of "slightly off" data in the wild, and I think it is important that client libraries don't just blow up in those cases. As long as the data is syntactically correct, the semantics may vary from domain to domain, and it would be unfortunate if we imposed constraints that prevented certain interesting use cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a compromise (actually, I think it's a better solution overall)

  • Leave the current get resources() method as-is

  • Also implement the validate() method and have it return an invalid result if there are illegal containment triples or if it doesn't end in a slash, etc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see both sides of the coin but I like the validate() option a lot! Validate should by default already have this check in then.
And we need to highlight it in the docs then I would be fine with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a distinction between a read method that just ignores triples we deem invalid and a validation method where the intent is to be aware of any issues sounds like a good pattern. I'll think about other cases when it would be applicable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timea-solid and @NSeydoux I'll wait on merging this until you have a chance to see my latest additions to this in d3fd371

@acoburn acoburn force-pushed the JCL-417-ldp-contains branch from fde54c7 to d3fd371 Compare July 5, 2023 19:58
child.getTypes().forEach(builder::type);
return new SolidResourceReference(URI.create(child.getIRIString()), builder.build());
}).collect(Collectors.collectingAndThen(Collectors.toSet(), Collections::unmodifiableSet));
final String container = normalize(getIdentifier());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe now you can call validate() here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not validate() directly, as that performs a number of checks that are irrelevant here. You will see that in https://github.com/inrupt/solid-client-java/pull/571/files#diff-2c9b192110a924fa828ba14eaa2e7359afa8b3b811c8ad12fd6563e9161d7ba0R90, we re-use some of the same code as validate in this pipeline

@acoburn acoburn merged commit ad1bac6 into main Jul 6, 2023
@acoburn acoburn deleted the JCL-417-ldp-contains branch July 6, 2023 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants