Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turtle serializer ignores the 'base' parameter #248

Closed
augusto-herrmann opened this issue Jan 23, 2013 · 6 comments
Closed

Turtle serializer ignores the 'base' parameter #248

augusto-herrmann opened this issue Jan 23, 2013 · 6 comments
Labels
bug Something isn't working in-resolution

Comments

@augusto-herrmann
Copy link

Hi,

I found this post by Ed Summers in 2007 [1] reporting exactly this same issue). By looking at the code [2] it seems the 'base' parameter is indeed ignored on the startDocument method. This is backed by my own experiments, it seems it still stands, despite the promise [3] by Gunnar that it would be fixed.

I could fork RDFLib and try to fix it myself, but I believe someone more familiar with RDFLib should be able to do it a lot faster than me.

[1] http://www.mail-archive.com/dev@rdflib.net/msg00158.html
[2] https://github.com/RDFLib/rdflib/blob/master/rdflib/plugins/serializers/turtle.py#L262
[3] http://www.mail-archive.com/dev@rdflib.net/msg00157.html

Cheers,
Augusto Herrmann

@gromgull
Copy link
Member

Thanks for noticing, again!

I am sure I meant it when I wrote it :)

I WILL fix it - just now I am in over my head with non-rdflib
project-work, I hope to have more time for rdflib in the beginning of feb!

  • Gunnar

On 23 January 2013 14:16, Augusto Herrmann notifications@github.com wrote:

Hi,

I found this post by Ed Summers in 2007 [1] reporting exactly this same
issue). By looking at the code [2] it seems the 'base' parameter is indeed
ignored on the startDocument method. This is backed by my own experiments,
it seems it still stands, despite the promise [3] by Gunnar that it would
be fixed.

I could fork RDFLib and try to fix it myself, but I believe someone more
familiar with RDFLib should be able to do it a lot faster than me.

[1] http://www.mail-archive.com/dev@rdflib.net/msg00158.html
[2]
https://github.com/RDFLib/rdflib/blob/master/rdflib/plugins/serializers/turtle.py#L262
[3] http://www.mail-archive.com/dev@rdflib.net/msg00157.html

Cheers,
Augusto Herrmann


Reply to this email directly or view it on GitHubhttps://github.com//issues/248.

http://gromgull.net

@ghost
Copy link

ghost commented Jan 23, 2013

Drat, I thought I could knock this one off as "low-hanging fruit" but I suspect the modelling in this part of RDFLib may be incomplete.

There is no documentation of the "base" keyword arg for the Graph.serialize() method . At a guess its parse-side counterpart kwarg is "publicID" (publicID: the logical URI to use as the document base. If None specified the document location is used (at least in the case where there is a document location).

Q1: what is the (intended) expected type of the publicID binding for Graph.parse() - string or rdflib.Namespace? And what does the term "logical URI" mean in this instance? (it has no widely-accepted specific definition that I can find)

Q2: what is the (intended) expected type of the "base" kwarg binding for Graph.serialize() - string or rdflib.Namespace?

I was idly curious: if a publicID is specified on parsing, what happens to that datum when base=someNS is provided for serialization - does that create conflicting semantics? I did have a play but I failed to make sense of the results, the base namespace seems to be omitted from the all parse-serialize publicID/base variants that I plucked out of the air for testing and I don't have time ATM to methodically map out all the permutations.

I've added a test for this explicit failure to aid investigation.

I've made a mental note to spend a rainy Sunday afternoon thrashing this one out in the docs.

@gromgull
Copy link
Member

Without looking at any spec, I would expect a given base parameter to write relative URI for any URI within the base-URI space, i.e. writing <http://example.org/people/bob> foaf:company <http://example.org/companies/AcmeInc> with base <http://example.org> should write <people/bob> ...

for Q1 and Q2 - I would expect string, URIRef or Namespace to work. Semi-related is the confusion around namespace vs. URIRef causing this: RDFLib/rdflib-jsonld#11

I've thought of ''logical URI'' to be the (possibly abstract) URI you want to associate with the graph - i.e. I read the rdf/xml from a file:// uri, but I want to associate it with the uri ``uri:my:random:collection:of:stuff` - I doubt this notion is well defined anywhere :)

Base on the other hand is a way to specify that you want relative URIs written. From my point of view, sprinkling relative URIs in your rdf files is asking for trouble... but they are generally supported on import.

The n3 code we took from cwm had methods to find a relative URI, relative to a certain base, I believe I deleted in a recent cleanup :)

@ghost
Copy link

ghost commented Jan 24, 2013

That's terrific, gromgull, thanks for the clarifications - I'll propagate them to the docs and the tests.

@augusto-herrmann
Copy link
Author

I assure you I meant it as reaffirming the importance of this issue, rather than as laying blame on you, gromgull. :)

As for the possibile interpretations of these parameters:

Q1) I have no idea what publicID is supposed to mean. Perhaphs it's the graph URI to be rendered in the TriG/TriX formats?

Q2) as for base, I think it should behave exactly as gromgull describes. All URIs that can't be prefixed in Tutle (thus enclosed in <> signs) should be checked if they start with the base URI. If so, render them as relative URIs instead of full absolute URIs. I remember reading somewhere that the base URI shouldn't contain a hash, so that should be checked for too, and an exception should be raised if it's the case.

Regarding types, I agree that string, Namespace or URIRef should work (maybe unicode as well?).

As for the actual implementation, I think the urlparse module should be useful.

I'd also like to add that using both base and relative URIs in Turtle can drastically increase readability and descrease file size in many situations.

@gromgull
Copy link
Member

I forgot one important thing above, publicID in the parse method is used to resolve any relative URIs in the input!

I've added support base for n3/turtle serializer - and fixed up the tests a bit. Apologies that it took 6 years ;)

One remaining issue is whether the serializer should write an actual ''@base <... > '' directive - this way you can parse the output back with a publicID and get the same triples.

I left it without @base directive now - this way you can quickly and easily 'rebase' some triples.

mamash pushed a commit to TritonDataCenter/pkgsrc-wip that referenced this issue Feb 15, 2014
	2013/12/31 RELEASE 4.1
======================

This is a new minor version RDFLib, which includes a handful of new features:

* A TriG parser was added (we already had a serializer) - it is
  up-to-date wrt. to the newest spec from: http://www.w3.org/TR/trig/

* The Turtle parser was made up to date wrt. to the latest Turtle spec.

* Many more tests have been added - RDFLib now has over 2000
  (passing!) tests. This is mainly thanks to the NT, Turtle, TriG,
  NQuads and SPARQL test-suites from W3C. This also included many
  fixes to the nt and nquad parsers.

* ```ConjunctiveGraph``` and ```Dataset``` now support directly adding/removing
  quads with ```add/addN/remove``` methods.

* ```rdfpipe``` command now supports datasets, and reading/writing context
  sensitive formats.

* Optional graph-tracking was added to the Store interface, allowing
  empty graphs to be tracked for Datasets. The DataSet class also saw
  a general clean-up, see: RDFLib/rdflib#309

* After long deprecation, ```BackwardCompatibleGraph``` was removed.

Minor enhancements/bugs fixed:
------------------------------

* Many code samples in the documentation were fixed thanks to @PuckCh

* The new ```IOMemory``` store was optimised a bit

* ```SPARQL(Update)Store``` has been made more generic.

* MD5 sums were never reinitialized in ```rdflib.compare```

* Correct default value for empty prefix in N3
  [#312]RDFLib/rdflib#312

* Fixed tests when running in a non UTF-8 locale
  [#344]RDFLib/rdflib#344

* Prefix in the original turtle have an impact on SPARQL query
  resolution
  [#313]RDFLib/rdflib#313

* Duplicate BNode IDs from N3 Parser
  [#305]RDFLib/rdflib#305

* Use QNames for TriG graph names
  [#330]RDFLib/rdflib#330

* \uXXXX escapes in Turtle/N3 were fixed
  [#335]RDFLib/rdflib#335

* A way to limit the number of triples retrieved from the
  ```SPARQLStore``` was added
  [#346]RDFLib/rdflib#346

* Dots in localnames in Turtle
  [#345]RDFLib/rdflib#345
  [#336]RDFLib/rdflib#336

* ```BNode``` as Graph's public ID
  [#300]RDFLib/rdflib#300

* Introduced ordering of ```QuotedGraphs```
  [#291]RDFLib/rdflib#291

2013/05/22 RELEASE 4.0.1
========================

Following RDFLib tradition, some bugs snuck into the 4.0 release.
This is a bug-fixing release:

* the new URI validation caused lots of problems, but is
  nescessary to avoid ''RDF injection'' vulnerabilities. In the
  spirit of ''be liberal in what you accept, but conservative in
  what you produce", we moved validation to serialisation time.

* the   ```rdflib.tools```   package    was   missing   from   the
  ```setup.py```  script, and  was therefore  not included  in the
  PYPI tarballs.

* RDF parser choked on empty namespace URI
  [#288](RDFLib/rdflib#288)

* Parsing from ```sys.stdin``` was broken
  [#285](RDFLib/rdflib#285)

* The new IO store had problems with concurrent modifications if
  several graphs used the same store
  [#286](RDFLib/rdflib#286)

* Moved HTML5Lib dependency to the recently released 1.0b1 which
  support python3

2013/05/16 RELEASE 4.0
======================

This release includes several major changes:

* The new SPARQL 1.1 engine (rdflib-sparql) has been included in
  the core distribution. SPARQL 1.1 queries and updates should
  work out of the box.

  * SPARQL paths are exposed as operators on ```URIRefs```, these can
    then be be used with graph.triples and friends:

    ```py
    # List names of friends of Bob:
    g.triples(( bob, FOAF.knows/FOAF.name , None ))

    # All super-classes:
    g.triples(( cls, RDFS.subClassOf * '+', None ))
    ```

      * a new ```graph.update``` method will apply SPARQL update statements

* Several RDF 1.1 features are available:
  * A new ```DataSet``` class
  * ```XMLLiteral``` and ```HTMLLiterals```
  * ```BNode``` (de)skolemization is supported through ```BNode.skolemize```,
    ```URIRef.de_skolemize```, ```Graph.skolemize``` and ```Graph.de_skolemize```

* Handled of Literal equality was split into lexical comparison
  (for normal ```==``` operator) and value space (using new ```Node.eq```
  methods). This introduces some slight backwards incomaptible
  changes, but was necessary, as the old version had
  inconsisten hash and equality methods that could lead the
  literals not working correctly in dicts/sets.
  The new way is more in line with how SPARQL 1.1 works.
  For the full details, see:

  https://github.com/RDFLib/rdflib/wiki/Literal-reworking

* Iterating over ```QueryResults``` will generate ```ResultRow``` objects,
  these allow access to variable bindings as attributes or as a
  dict. I.e.

  ```py
  for row in graph.query('select ... ') :
     print row.age, row["name"]
  ```

* "Slicing" of Graphs and Resources as syntactic sugar:
  ([#271](RDFLib/rdflib#271))

  ```py
  graph[bob : FOAF.knows/FOAF.name]
            -> generator over the names of Bobs friends
  ```

* The ```SPARQLStore``` and ```SPARQLUpdateStore``` are now included
  in the RDFLib core

* The documentation has been given a major overhaul, and examples
  for most features have been added.


Minor Changes:
--------------

* String operations on URIRefs return new URIRefs: ([#258](RDFLib/rdflib#258))
  ```py
  >>> URIRef('http://example.org/')+'test
  rdflib.term.URIRef('http://example.org/test')
  ```

* Parser/Serializer plugins are also found by mime-type, not just
  by plugin name:  ([#277](RDFLib/rdflib#277))
* ```Namespace``` is no longer a subclass of ```URIRef```
* URIRefs and Literal language tags are validated on construction,
  avoiding some "RDF-injection" issues ([#266](RDFLib/rdflib#266))
* A new memory store needs much less memory when loading large
  graphs ([#268](RDFLib/rdflib#268))
* Turtle/N3 serializer now supports the base keyword correctly ([#248](RDFLib/rdflib#248))
* py2exe support was fixed ([#257](RDFLib/rdflib#257))
* Several bugs in the TriG serializer were fixed
* Several bugs in the NQuads parser were fixed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working in-resolution
Projects
None yet
Development

No branches or pull requests

2 participants