Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset cleanup - Store API for graph method #309

Merged
merged 5 commits into from
Aug 11, 2013
Merged

Dataset cleanup - Store API for graph method #309

merged 5 commits into from
Aug 11, 2013

Conversation

gromgull
Copy link
Member

Cleaning up Dataset class, adding graph tracking to store API, as
discussed in #307

Summary of changes:

  • added methods add_graph and remove_graph to the Store
    API, implemented these for Sleepycat and IOMemory. A flag,
    graph_awareness is set on the store if they methods are
    supported, default implementations will raise an exception.
  • made the dataset require a store with the graph_awareness
    flag set.
  • removed the graph-state kept in the Dataset class directly.
  • removed dataset.add_quads, remove_quads methods. The
    add/remove methods of ConjunctiveGraph are smart enough
    to work with triples or quads.
  • removed the dataset.graphs method - it now does exactly the
    same as contexts
  • added a default_union flag to Graphs, ConjunctiveGraph has this set to True, and Dataset to False
  • cleaned up a bit more confusion of whether Graph instance or the
    Graph identifiers are passed to store methods. (Think about __iadd__, __isub__ etc. for ConjunctiveGraph #225)

Comments:

  • The use-case where a Dataset exposes only some of the graphs that
    exist in the store is not supported.
  • I have not thought about transactions, and how creating or removing
    a graph fit into a transaction. It is sort irrelevant, as we do not
    have any transaction-aware stores.

Questions:

  • Should dataset.context return the default graph?
    dataset.quads return None as context for the triples in
    the default graph, this would suggest "no". (currently it does)
  • Do we really need dataset.graph AND dataset.get_context?
    get_context return a Graph, but does not create it.
    graph creates the graphs AND will also make a new graph
    identified by a skolemized bnode if called without arguments.
  • Should it be possible to disable graph_awareness for a Store?
    The only change would be whether a graph is removed once it is
    empty. I guess noone relies on this, as it is not implemented
    correctly in IOMemory or Sleepycat at the moment :)

@gromgull
Copy link
Member Author

And ignore the fact that some of the old unit tests fail, if we agree that this is ok I'll tidy it up

@uholzer
Copy link
Contributor

uholzer commented Jun 27, 2013

Some comments:

Dataset.DEFAULT does not exist anymore, but I want to get the default graph as context. Can I use DATASET_DEFAULT_GRAPH_ID now? Is there an other way. Of course I considered using the Dataset directly to mess with the Default graph, but this way I would need to add several case distinctions in my code. Is there another way to parse into the default graph using Dataset.parse?

The second thing is that I tried to parse an empty graph:

>>> d = rdflib.Dataset()
>>> d.parse(data='', format='turtle')
<Graph identifier=N3759e73365ba47b7b4002c3558ba4f01 (<class 'rdflib.graph.Graph'>)>
>>> for g in d.contexts(): print(g)
... 
<urn:x-rdflib:default> a rdfg:Graph;rdflib:storage [a rdflib:Store;rdfs:label 'IOMemory'].

As you see, it's not there, but since parse returns a graph, I expect that it has been added to the Dataset.

@gromgull
Copy link
Member Author

You can use DATASET_DEFAULT_GRAPH_ID, but I didn't really consider it part of the public API.

Perhaps adding a get_default_graph method?

The parse problem is interesting, I'll have to look at the internals of the parser... and that means each and every parser....

@uholzer
Copy link
Contributor

uholzer commented Jun 27, 2013

The parse problem is interesting, I'll have to look at the internals of the parser... and that means each and every parser...

Really? I mean, it DOES return a graph ...
Isn't just ConjunctiveGraph.parse the problem which does not add the graph it has created? The graph only turns up as context if non-empty, because it is backed by the same store ...

@gromgull
Copy link
Member Author

Hmm - yes that solves to issue when only a single graph is created. But if you parse a trix document with empty graphs? (or trig, if we had parser) - they will create several graphs, but each parser is free to decide whether to use addN or get_context to add triples in particular graph/context.

I can do the simple fix now, and maybe clean up the other one when we solve #283 (i started this in a not yet pushed branch)

@iherman
Copy link
Contributor

iherman commented Jun 27, 2013

At moment, what I have (in my internal stuff) is

ds.default_graph_id
ds.default_graph

I also kept the Dataset.DEFAULT as a shorthand that a user can use although,
thinking about it further, it is probably be superfluous. The difference is that
Dataset.DEFAULT is a symbolic constant, so to say, independently of the Dataset
instance, whereas ds.default_graph_id is an instance variable, initialized by
the Dataset. But that should be ok...

Ivan

Urs Holzer wrote:

Some comments:

|Dataset.DEFAULT| does not exist anymore, but I want to get the default graph as
context. Can I use |DATASET_DEFAULT_GRAPH_ID| now? Is there an other way. Of
course I considered using the Dataset directly to mess with the Default graph,
but this way I would need to add several case distinctions in my code. Is there
another way to parse into the default graph using Dataset.parse?

The second thing is that I tried to parse an empty graph:

|>>> d = rdflib.Dataset()

d.parse(data='', format='turtle')
<Graph identifier=N3759e73365ba47b7b4002c3558ba4f01 (<class 'rdflib.graph.Graph'>)>
for g in d.contexts(): print(g)
...
urn:x-rdflib:default a rdfg:Graph;rdflib:storage [a rdflib:Store;rdfs:label 'IOMemory'].
|

As you see, it's not there, but since parse returns a graph, I expect that it
has been added to the Dataset.


Reply to this email directly or view it on GitHub
#309 (comment).

Ivan Herman
4, rue Beauvallon, Clos St. Joseph
13090 Aix-en-Provence
France
tel: +31-64-1044153 ou +33 6 52 46 00 43
http://www.ivan-herman.net

@uholzer
Copy link
Contributor

uholzer commented Jun 27, 2013

I forgot about ConjunctiveGraph.default_context. Is this one part of the public API?

@iherman
Copy link
Contributor

iherman commented Jun 27, 2013

Yep.

Ivan

Urs Holzer wrote:

I forgot about |ConjunctiveGraph.default_context|. Is this one part of the
public API?


Reply to this email directly or view it on GitHub
#309 (comment).

Ivan Herman
4, rue Beauvallon, Clos St. Joseph
13090 Aix-en-Provence
France
tel: +31-64-1044153 ou +33 6 52 46 00 43
http://www.ivan-herman.net

discussed in #307

Summary of changes:

 * added methods ```add_graph``` and ```remove_graph``` to the Store
   API, implemented these for Sleepycat and IOMemory. A flag,
   ```graph_awareness``` is set on the store if they methods are
   supported, default implementations will raise an exception.

 * made the dataset require a store with the ```graph_awareness```
   flag set.

 * removed the graph-state kept in the ```Dataset``` class directly.

 * removed ```dataset.add_quads```, ```remove_quads``` methods. The
   ```add/remove``` methods of ```ConjunctiveGraph``` are smart enough
   to work with triples or quads.

 * removed the ```dataset.graphs``` method - it now does exactly the
   same as ```contexts```

 * cleaned up a bit more confusion of whether Graph instance or the
   Graph identifiers are passed to store methods. (#225)
Add graphs when parsing, so also empty graphs are added.
@gromgull
Copy link
Member Author

I've made all the tests pass in this branch, and adapted the dawg test and sparql engine a tiny bit.

The only remaining question may be if contexts() should return the default graph

@gromgull
Copy link
Member Author

Note: I've also rebased this onto the latest master, if you had a local copy you may have to force pull

@coveralls
Copy link

Coverage Status

Coverage increased (+0%) when pulling 6c026d0 on graphaware into 8fad4ed on master.

@uholzer
Copy link
Contributor

uholzer commented Aug 10, 2013

Ihave one wish: Could you add a Graph.graph_aware set to true by the subclass Dataset? This would be usueful for code that makes full use of Dataset but is also able to work with ConjunctiveGraph or a plain Graph. Graph.context_aware and Graph.default_union are already there.

@gromgull
Copy link
Member Author

It seems we have more or less agreement I will merge this now - we can sort out the remaining bits in issues of their own!

gromgull added a commit that referenced this pull request Aug 11, 2013
Dataset cleanup - Store API for graph method
@gromgull gromgull merged commit e70451e into master Aug 11, 2013
@gromgull gromgull deleted the graphaware branch August 11, 2013 18:50
mamash pushed a commit to TritonDataCenter/pkgsrc-wip that referenced this pull request Feb 15, 2014
	2013/12/31 RELEASE 4.1
======================

This is a new minor version RDFLib, which includes a handful of new features:

* A TriG parser was added (we already had a serializer) - it is
  up-to-date wrt. to the newest spec from: http://www.w3.org/TR/trig/

* The Turtle parser was made up to date wrt. to the latest Turtle spec.

* Many more tests have been added - RDFLib now has over 2000
  (passing!) tests. This is mainly thanks to the NT, Turtle, TriG,
  NQuads and SPARQL test-suites from W3C. This also included many
  fixes to the nt and nquad parsers.

* ```ConjunctiveGraph``` and ```Dataset``` now support directly adding/removing
  quads with ```add/addN/remove``` methods.

* ```rdfpipe``` command now supports datasets, and reading/writing context
  sensitive formats.

* Optional graph-tracking was added to the Store interface, allowing
  empty graphs to be tracked for Datasets. The DataSet class also saw
  a general clean-up, see: RDFLib/rdflib#309

* After long deprecation, ```BackwardCompatibleGraph``` was removed.

Minor enhancements/bugs fixed:
------------------------------

* Many code samples in the documentation were fixed thanks to @PuckCh

* The new ```IOMemory``` store was optimised a bit

* ```SPARQL(Update)Store``` has been made more generic.

* MD5 sums were never reinitialized in ```rdflib.compare```

* Correct default value for empty prefix in N3
  [#312]RDFLib/rdflib#312

* Fixed tests when running in a non UTF-8 locale
  [#344]RDFLib/rdflib#344

* Prefix in the original turtle have an impact on SPARQL query
  resolution
  [#313]RDFLib/rdflib#313

* Duplicate BNode IDs from N3 Parser
  [#305]RDFLib/rdflib#305

* Use QNames for TriG graph names
  [#330]RDFLib/rdflib#330

* \uXXXX escapes in Turtle/N3 were fixed
  [#335]RDFLib/rdflib#335

* A way to limit the number of triples retrieved from the
  ```SPARQLStore``` was added
  [#346]RDFLib/rdflib#346

* Dots in localnames in Turtle
  [#345]RDFLib/rdflib#345
  [#336]RDFLib/rdflib#336

* ```BNode``` as Graph's public ID
  [#300]RDFLib/rdflib#300

* Introduced ordering of ```QuotedGraphs```
  [#291]RDFLib/rdflib#291

2013/05/22 RELEASE 4.0.1
========================

Following RDFLib tradition, some bugs snuck into the 4.0 release.
This is a bug-fixing release:

* the new URI validation caused lots of problems, but is
  nescessary to avoid ''RDF injection'' vulnerabilities. In the
  spirit of ''be liberal in what you accept, but conservative in
  what you produce", we moved validation to serialisation time.

* the   ```rdflib.tools```   package    was   missing   from   the
  ```setup.py```  script, and  was therefore  not included  in the
  PYPI tarballs.

* RDF parser choked on empty namespace URI
  [#288](RDFLib/rdflib#288)

* Parsing from ```sys.stdin``` was broken
  [#285](RDFLib/rdflib#285)

* The new IO store had problems with concurrent modifications if
  several graphs used the same store
  [#286](RDFLib/rdflib#286)

* Moved HTML5Lib dependency to the recently released 1.0b1 which
  support python3

2013/05/16 RELEASE 4.0
======================

This release includes several major changes:

* The new SPARQL 1.1 engine (rdflib-sparql) has been included in
  the core distribution. SPARQL 1.1 queries and updates should
  work out of the box.

  * SPARQL paths are exposed as operators on ```URIRefs```, these can
    then be be used with graph.triples and friends:

    ```py
    # List names of friends of Bob:
    g.triples(( bob, FOAF.knows/FOAF.name , None ))

    # All super-classes:
    g.triples(( cls, RDFS.subClassOf * '+', None ))
    ```

      * a new ```graph.update``` method will apply SPARQL update statements

* Several RDF 1.1 features are available:
  * A new ```DataSet``` class
  * ```XMLLiteral``` and ```HTMLLiterals```
  * ```BNode``` (de)skolemization is supported through ```BNode.skolemize```,
    ```URIRef.de_skolemize```, ```Graph.skolemize``` and ```Graph.de_skolemize```

* Handled of Literal equality was split into lexical comparison
  (for normal ```==``` operator) and value space (using new ```Node.eq```
  methods). This introduces some slight backwards incomaptible
  changes, but was necessary, as the old version had
  inconsisten hash and equality methods that could lead the
  literals not working correctly in dicts/sets.
  The new way is more in line with how SPARQL 1.1 works.
  For the full details, see:

  https://github.com/RDFLib/rdflib/wiki/Literal-reworking

* Iterating over ```QueryResults``` will generate ```ResultRow``` objects,
  these allow access to variable bindings as attributes or as a
  dict. I.e.

  ```py
  for row in graph.query('select ... ') :
     print row.age, row["name"]
  ```

* "Slicing" of Graphs and Resources as syntactic sugar:
  ([#271](RDFLib/rdflib#271))

  ```py
  graph[bob : FOAF.knows/FOAF.name]
            -> generator over the names of Bobs friends
  ```

* The ```SPARQLStore``` and ```SPARQLUpdateStore``` are now included
  in the RDFLib core

* The documentation has been given a major overhaul, and examples
  for most features have been added.


Minor Changes:
--------------

* String operations on URIRefs return new URIRefs: ([#258](RDFLib/rdflib#258))
  ```py
  >>> URIRef('http://example.org/')+'test
  rdflib.term.URIRef('http://example.org/test')
  ```

* Parser/Serializer plugins are also found by mime-type, not just
  by plugin name:  ([#277](RDFLib/rdflib#277))
* ```Namespace``` is no longer a subclass of ```URIRef```
* URIRefs and Literal language tags are validated on construction,
  avoiding some "RDF-injection" issues ([#266](RDFLib/rdflib#266))
* A new memory store needs much less memory when loading large
  graphs ([#268](RDFLib/rdflib#268))
* Turtle/N3 serializer now supports the base keyword correctly ([#248](RDFLib/rdflib#248))
* py2exe support was fixed ([#257](RDFLib/rdflib#257))
* Several bugs in the TriG serializer were fixed
* Several bugs in the NQuads parser were fixed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants