URLInputSource can be abused to retrieve arbitrary documents if used naïvely #1543
Replies: 9 comments
-
c.f. https://python-jsonschema.readthedocs.io/en/stable/references/ |
Beta Was this translation helpful? Give feedback.
-
Thanks for calling this issue out @alexdutton. If you plan on addressing it in the way you describe, I'll review it as soon as there's a PR and we can try and merge shortly into a 6.0.x release. |
Beta Was this translation helpful? Give feedback.
-
TBH, auto-resolution of contexts maybe should be off by default. Would a GLOBAL_ENV_VAR like RDFLIB_RESOLVE_EXTERNAL_URIS=1 that defaults to 0 if unset be more safe?
re: the note about Conneg in the source:
|
Beta Was this translation helpful? Give feedback.
-
Hi @nicholascar :-). Apologies for the delay, but I've just/finally pushed a draft PR. I've laid it out as a series of logical changes, one per commit, so hopefully it shouldn't be too difficult to follow the rationale for each change in the commit messages. I've not added any further tests or documentation yet, but am happy to do that once the maintainers have validated the approach. @westurner I've allow-listed the JSON-LD Recommended Context, but have left caching alone as out of scope (for now). It pays attention to three environment variables ( All of this doesn't change the current |
Beta Was this translation helpful? Give feedback.
-
@nicholascar Hi, I'm trying to contact you via email regarding this issue (might be going to your spam). If you could take a look there that would be appreciated! |
Beta Was this translation helpful? Give feedback.
-
@hadasbloom apologies, I did see your email but was waiting for work here and, I must admit, I've forgotten about this issue for a while. Apologies. I'll try and follow up on this and have a PR ready from @alexdutton merged shortly and we'll make a new release. |
Beta Was this translation helpful? Give feedback.
-
Hey @nicholascar :) |
Beta Was this translation helpful? Give feedback.
-
Hey, any updates on this issue? |
Beta Was this translation helpful? Give feedback.
-
I have locked this disucssion, please take further discussion on #1844 - this should be an issue, not a discussion topic. |
Beta Was this translation helpful? Give feedback.
-
This is mostly related to rdflib-jsonld, but the dereferencing implementation is in rdflib, hence raising it here.
Scenario
If a web service takes POSTed JSON-LD data, e.g. as part of a Linked Data Notifications implementation, rdflib will attempt to resolve any URL in the
@context
. This can lead to:@context
file://
URLsProblem
rdflib provides no way to control how external references are resolved, nor a way to implement caching of external resources.
An implementor should be able to:
These things should either be possible directly, or there should be an obvious way to hook them in.
Resolution
A new
Resolver
base class should be added that takes responsibility for resolving external references and returningInputSource
instances, probably encapsulating thecreate_input_source()
behaviour in aresolve()
method. There should be a default implementation that resolves everything called e.g.DefaultResolver
. Maybe this resolver has an instantiation parameter likeresolve_schemes=('file', 'http', 'https')
so it's easy to turn off dereferencing.An optional
resolver
argument should be added toGraph.parse()
, so that implementors can override the default behaviour. This is then passed down to theParser.parse()
plugin implementation, defaulting to an instance ofDefaultResolver
if not specified.Finally, rdflib-jsonld can be updated to use the
resolver
instead ofcreate_input_source
directly.Maybe there should also be a way to install a global default resolver to easily implement these protections without having to track down every
Graph.parse()
call.Happy to put together a PR if/when an approach is agreed.
Beta Was this translation helpful? Give feedback.
All reactions