Clarify respective use cases of `Store.add` vs. `Store.addN`, and correct use of `Graph.parse` #357

pchampin · 2014-02-20T12:00:41Z

The respective use cases between Store.add and Store.addN are not really clear. More precisely, when some one has several triples to write to a store (not necessarily many, nor necessarily available as an iterable), is it

(a) acceptable to use multiple calls to Store.add, or
(b) should one avoid that and absolutely use Store.addN?

This needs clarification because currently the answer is "it depends on the store"; more precisely "IOMemory" and "Sleepycat" answer (a), as they have a rather efficient add method, while SPARQLUpdateStore or the SqlAlchemy plugin answer (b), since they make a single query to the underlying store for each call to add.

There is a rather serious consequence to this lack of clarity, in the use of Graph.parse. All parsers in rdflib.plugins use add rather than addN. As a consequence, it might be very inefficient to write:

# g is a Graph
g.parse(filename, format=f)

depending on the underlying store of g. For "add-inefficient" stores, one has to write:

data = rdflib.Graph() # in memory graph
data.parse(filename, format=f)
g += data  # uses addN

This breaks object encapsulation, which is bad.

I see two solutions :

Make it explicit in the documentation of Store that multiple uses of Store.add should be considered an exceptional case, Store.addN should always be preferred. This makes it easier for Store implementers, but harder for programmers using stores (as they have to handle the "buffering" of triples). In particular, this requires to fix all the parser implementation to comply with this.
Make it explicit in the documentation of Store that multiple uses of Store.add can be a valid use case, and thus encourage Store implementers to consider naive/straightforward implementations of Store.add as a bug rather than a feature.

The text was updated successfully, but these errors were encountered:

gromgull · 2014-02-20T12:29:24Z

The idea is that addN should be semantically equivalent to repeat calls to add, but allow stores where transections matter to override and do a bulk add.

I think the fact that parse uses add is more random than a conscious design decision, and could possibly be changed. The flip-side of things is that if you cache all triples in either a list or an in memory graph before adding, you may run out of memory.

Another problem is that each parser pretty much makes this decision itself, each call add in different ways.

In the often talked about, long promised, but not really pending any time very soon, reworking of the parsers to allow streaming parsing would have one central Sink object that adds triples to a Graph. Once this is in place we could allow customizing the buffer of this Sink, i.e. do an addN for every 1000 triples, etc.

- Should address RDFLib#357

mwatts15 · 2019-08-31T14:20:18Z

As an active user of RDFLib, I don't see that there's much "clarification" needed. The choice between addN and add is as @gromgull describes, but it is, ultimately, up to the application using RDFLib to decide whether batching is appropriate based on their application needs.

Batching add into addN is pretty trivial, but we can save users the effort of doing it themselves. I've made a context manager and wrapper for Graph that does this.

pchampin · 2019-08-31T18:44:58Z

@mwatts15

The choice between addN and add is (...) ultimately, up to the application using RDFLib

Well, not when they are using graph.parse, which makes this choice for them...

I've made a context manager and wrapper for Graph that does this.

That's a great idea :)

@gromgull

I think the fact that parse uses add is more random than a conscious design decision

So I thought, but that turns out to be unfortunate.

- Should address RDFLib#357

pchampin mentioned this issue Feb 20, 2014

Bulk insertion : avoid commiting each time a triple is added RDFLib/rdflib-sqlalchemy#9

Closed

mwatts15 added a commit to mwatts15/rdflib that referenced this issue Aug 31, 2019

Adding a wrapper for batching add() calls to a Graph

128ceb6

- Should address RDFLib#357

mwatts15 mentioned this issue Aug 31, 2019

Adding a wrapper for batching add() calls to a Graph #931

Merged

white-gecko added enhancement New feature or request and removed enhancement New feature or request labels Mar 15, 2020

white-gecko added this to the rdflib 5.1.0 milestone Mar 15, 2020

mwatts15 added a commit to mwatts15/rdflib that referenced this issue Mar 21, 2020

Adding a wrapper for batching add() calls to a Graph

48cd7df

- Should address RDFLib#357

white-gecko modified the milestones: rdflib 5.1.0, rdflib 6.0.0 May 1, 2020

ghost added the documentation Improvements or additions to documentation label Dec 24, 2021

ghost locked and limited conversation to collaborators Dec 26, 2021

ghost converted this issue into discussion #1597 Dec 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Clarify respective use cases of `Store.add` vs. `Store.addN`, and correct use of `Graph.parse` #357

Clarify respective use cases of `Store.add` vs. `Store.addN`, and correct use of `Graph.parse` #357

pchampin commented Feb 20, 2014

gromgull commented Feb 20, 2014

mwatts15 commented Aug 31, 2019

pchampin commented Aug 31, 2019

This issue was moved to a discussion.

This issue was moved to a discussion.

Clarify respective use cases of Store.add vs. Store.addN, and correct use of Graph.parse #357

Clarify respective use cases of Store.add vs. Store.addN, and correct use of Graph.parse #357

Comments

pchampin commented Feb 20, 2014

gromgull commented Feb 20, 2014

mwatts15 commented Aug 31, 2019

pchampin commented Aug 31, 2019

This issue was moved to a discussion.

Clarify respective use cases of `Store.add` vs. `Store.addN`, and correct use of `Graph.parse` #357

Clarify respective use cases of `Store.add` vs. `Store.addN`, and correct use of `Graph.parse` #357