-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Formats that don't support named graphs serialize Dataset
s and ConjunctiveGraph
s with non-default graphs without raising any errors
#2393
Comments
Please let me know if anyone disagrees with the proposed behaviour:
CC: @RDFLib/core-reviewers |
Before wrong output is generated by What would be the correct output? Only the data from the default graph or no data "overwritten" from the default graph? Or is there no (simple) correct turtle output for this conjunctive graph? I ask because i would expect exactly the given output, but i havent worked much with conjunctive graphs. I lack the correct word for what i mean with overwritten. I mean something like this:
|
When trying to serialize named graphs using a format that doesn't support named graphs (e.g. turtle) I think the only options are:
RDF does not work like this, there is no ability for new triples to affect previous triples. If the same subject and predicate appears twice in an input document with a different object, then you just get another triple. If you want to prevent this, you would need to use SHACL or something in your ingestion pipeline, but just RDF itself just treats it as another triple. https://www.w3.org/TR/rdf11-concepts/#section-rdf-graph
So, as long as every part of the triple is unique in the set, it is a unique triple, and does not invalidate another triple with the same subject and predicate. |
Throwing an error is certainly most explicit. And while I presume it is backwards incompatible, if it is wrong right now, it seems better to error out than to change its behaviour. It might be good to note how the RDF 1.1 Concepts defines content negotiation of RDF datasets:
It is understandable why e.g Jena has made that decision, and it warrants consideration. But invoking serialization programmatically is not the same as content negotiation, but rather how you'd implement it, so if the user of the API expects an RDF graph, the user is to use the (The error should probably be helpful and suggest that. It might also be helpful to add an alias property named To reason further,, the "role" of named graphs in a dataset may vary (some use the dataset as a union of graphs, some perhaps as versions of descriptions where only an explicitly chosen subset of them are considered "valid" or "active"). So it makes more sense to force an active choice. It could be useful to make it easy to serialize the union as a stream of triples, but that would be an additional feature. |
I posted to the public-rdf-dev mailing list also, https://lists.w3.org/Archives/Public/public-rdf-dev/2023May/0000.html |
The RDF spec asserts:
If the user has specified a context-unaware serialization format, it’s not unreasonable to treat this as intentional and to return the serialized triples of the default graph because the Dataset default graph has no name¹ and can therefore be considered as selected by the specification of a context-unaware serialization format. Is it perhaps more usefully viewed as an implementation-independent means of specifying serialization of a Dataset’s unnamed default graph? |
On the other hand, someone could have made a mistake in their code, and did not realize that the format at a specific point is or can be context-unaware. In this case, trying to guess what the user meant would be masking a bug in the users code, where as if they really did mean to serialize only the default context, there would have been an easy way for them to do it explicitly.
There are already ways to do this much more explicitly and without ambiguity, which users should use instead: dataset = Dataset()
## add named graphs and triples
dataset.default_context.serialize(format="turtle") |
Neither a guess nor an assumption, simply following the direction of the spec.
That's going to break in v7, |
If you are referring to this "RDF datasets may be used to express RDF content. When used in this way, a dataset should be understood to have at least the same content as its default graph." [ref] I'm not sure this applies to the snippet I shared. The "may" in the first sentence normative [ref], meaning it may also not be used to express RDF content. So even if you take the should in the second sentence as normative, you still are assuming the first "may" to be operative, at the very least it should be documented as such. Is that using an RDF dataset to express RDF content? If that is, what would it look like when it is not being used to express RDF content.
Version 7 is not released so what will break is undefined. But even if the V7 API does introduce breaking changes (as it should), that does not make the current API brittle, that just means we are using semantic versioning as intended. The way I see it is: I don't like fragile software. If I ask software to do something, and it can't do it, it should error out, not do it half way, because I did not ask it to do it half way. That is why software interface should be as explicit as possible, I should explicitly be able to ask RDFLib, serialize the whole Dataset. To me, this is what I do when I call |
Never mind, I deleted it, it is a bit weird to have it separate from this, it will just split the conversation more, best that people just respond here with their preference. |
I think the right solution here is:
To me, this is the right solution because I think there should be a way to request to serialize the whole Dataset explicitly, and I don't see why it should not be |
- fixes <RDFLib#2393> checkpoint checkpoint checkpoint
If RDFLib is used to serialize a
Dataset
orConjunctiveGraph
that contains non-default graphs [ref] as a format that does not support named graphs (i.e. N-Triples or Turtle) no error is raised, and the output is wrong.Given this data in
test/data/variants/diverse_quads.nq
(equivalent to this trig document):rdflib/test/data/variants/diverse_quads.nq
Lines 1 to 10 in ddcc4eb
Using rdfpipe to convert it to turtle gives:
And using rdfpipe to convert it to ntriples gives:
In both cases, the output is wrong, not just incomplete.
I would say the right behaviour is that an error is raised when a
ConjunctiveGraph
orDataset
with non-default graphs are serialized using a format that does not support named graphs.I'm making this issue to get some feedback, but I will make a PR to fix it shortly.
There is a similar issue with riot from Jena [ref], but riots behaviour is not quite as bad
The text was updated successfully, but these errors were encountered: