Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify respective use cases of Store.add vs. Store.addN, and correct use of Graph.parse #357

Closed
pchampin opened this issue Feb 20, 2014 · 3 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Milestone

Comments

@pchampin
Copy link
Contributor

The respective use cases between Store.add and Store.addN are not really clear. More precisely, when some one has several triples to write to a store (not necessarily many, nor necessarily available as an iterable), is it

  • (a) acceptable to use multiple calls to Store.add, or
  • (b) should one avoid that and absolutely use Store.addN?

This needs clarification because currently the answer is "it depends on the store"; more precisely "IOMemory" and "Sleepycat" answer (a), as they have a rather efficient add method, while SPARQLUpdateStore or the SqlAlchemy plugin answer (b), since they make a single query to the underlying store for each call to add.

There is a rather serious consequence to this lack of clarity, in the use of Graph.parse. All parsers in rdflib.plugins use add rather than addN. As a consequence, it might be very inefficient to write:

# g is a Graph
g.parse(filename, format=f)

depending on the underlying store of g. For "add-inefficient" stores, one has to write:

data = rdflib.Graph() # in memory graph
data.parse(filename, format=f)
g += data  # uses addN

This breaks object encapsulation, which is bad.

I see two solutions :

  1. Make it explicit in the documentation of Store that multiple uses of Store.add should be considered an exceptional case, Store.addN should always be preferred. This makes it easier for Store implementers, but harder for programmers using stores (as they have to handle the "buffering" of triples). In particular, this requires to fix all the parser implementation to comply with this.
  2. Make it explicit in the documentation of Store that multiple uses of Store.add can be a valid use case, and thus encourage Store implementers to consider naive/straightforward implementations of Store.add as a bug rather than a feature.
@gromgull
Copy link
Member

The idea is that addN should be semantically equivalent to repeat calls to add, but allow stores where transections matter to override and do a bulk add.

I think the fact that parse uses add is more random than a conscious design decision, and could possibly be changed. The flip-side of things is that if you cache all triples in either a list or an in memory graph before adding, you may run out of memory.

Another problem is that each parser pretty much makes this decision itself, each call add in different ways.

In the often talked about, long promised, but not really pending any time very soon, reworking of the parsers to allow streaming parsing would have one central Sink object that adds triples to a Graph. Once this is in place we could allow customizing the buffer of this Sink, i.e. do an addN for every 1000 triples, etc.

@mwatts15
Copy link
Contributor

As an active user of RDFLib, I don't see that there's much "clarification" needed. The choice between addN and add is as @gromgull describes, but it is, ultimately, up to the application using RDFLib to decide whether batching is appropriate based on their application needs.

Batching add into addN is pretty trivial, but we can save users the effort of doing it themselves. I've made a context manager and wrapper for Graph that does this.

@pchampin
Copy link
Contributor Author

@mwatts15

The choice between addN and add is (...) ultimately, up to the application using RDFLib

Well, not when they are using graph.parse, which makes this choice for them...

I've made a context manager and wrapper for Graph that does this.

That's a great idea :)

@gromgull

I think the fact that parse uses add is more random than a conscious design decision

So I thought, but that turns out to be unfortunate.

@white-gecko white-gecko added enhancement New feature or request and removed enhancement New feature or request labels Mar 15, 2020
@white-gecko white-gecko added this to the rdflib 5.1.0 milestone Mar 15, 2020
mwatts15 added a commit to mwatts15/rdflib that referenced this issue Mar 21, 2020
@white-gecko white-gecko modified the milestones: rdflib 5.1.0, rdflib 6.0.0 May 1, 2020
@ghost ghost added the documentation Improvements or additions to documentation label Dec 24, 2021
@ghost ghost locked and limited conversation to collaborators Dec 26, 2021
@ghost ghost converted this issue into discussion #1597 Dec 26, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants