-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write canonical NTriples 1.1 by default #35
Comments
This is a holdover from back in the day when NTriples was ASCII. serd now supports RDF 1.1 NTriples, which is UTF-8, but the command-line tool behaviour is still the same. The upcoming major version is more precise about this and lets you mix and match all kinds of options to get what you want. I'm not sure if the default could be changed without breaking things for people in the current version. Maybe? I agree that the option existing (it's meant for Turtle) makes this confusing, but I'm hesitant to change it and potentially break people's existing scripts/workflows/whatever... |
For reference, this is how the new command-line tool interfaces look: https://drobilla.net/files/serd_man_pages/ where |
I understand and can empathize with backwards compatibility, but the (current) specs seemed to be clear on this question, or I thought so on first read. Quote: "The content encoding of N-Triples is always UTF-8." That said, I have to say they seem to walk back on the clear directive in section 6.1 (if doc is plain/text it would be ASCII and escaped, etc..) I guess this gets into the nuances of "web document types" as opposed to files, so when working outside that frame work it is left up to individual interpretation. sigh. |
ASCII is a subset of UTF-8. In other words, the output of It's not canonical RDF 1.1 N-Triples though, because escaping like this is not allowed there (see link in OP). |
Ok, I'll rephrase ticket request, would like command line tool that outputs canonical N-Triples. (no escaped characters) |
Sure, I was just responding to the above comment. If you want this right now, I suggest building the I'll make a note to double-check the other canonical rules and make sure that the default output adheres to them, but I think it does. |
(Edited) The output does not appear to be UTF-8, is this is a bug? I thought UTF-8 would be the default given there is an option to "Write ASCII output if possible"
Example:
source triple from dbpedia/article-templates_lang=en_nested.ttl
<http://dbpedia.org/resource/André_Éric_Létourneau> <http://dbpedia.org/property/wikiPageUsesTemplate> <http://dbpedia.org/resource/Template:Birth_date_and_age> .
$ file article-templates_lang=en_nested.ttl
article-templates_lang=en_nested.ttl: UTF-8 Unicode text
serdi output:
<http://dbpedia.org/resource/Andr\u00E9_\u00C9ric_L\u00E9tourneau> <http://dbpedia.org/property/wikiPageUsesTemplate> <http://dbpedia.org/resource/Template:Birth_date_and_age> .
$ file article-templates_lang=en_nested-serdi.nt
article-templates_lang=en_nested-serdi.nt: ASCII text, with very long lines
apache jena riot output:
<http://dbpedia.org/resource/André_Éric_Létourneau> <http://dbpedia.org/property/wikiPageUsesTemplate> <http://dbpedia.org/resource/Template:Birth_date_and_age> .
$ file article-templates_lang=en_nested.ttl.bz2-riot.nt
article-templates_lang=en_nested.ttl.bz2-riot.nt: UTF-8 Unicode text
Spec Reference:
https://www.w3.org/TR/n-triples/#canonical-ntriples
Note: At first I thought maybe this was a BOM related rendering/display issue, but file would reveal if there is a BOM, and the same tools were used to find and display the examples above...
The text was updated successfully, but these errors were encountered: