implement bulk_add for applying several SQLite operations per transaction #3300

khimaros · 2019-06-07T05:20:54Z

Initially inspired by #1935 and seemingly relevant to #2388

Based on my snakeviz benchmarks in #1935 (comment) and #1935 (comment), dbcore overhead for an import operation on 847 local files was reduced from 95% (153 seconds) to <10% (5 seconds) .

I have exercised this patch with a dbcore frontend I built to organize, query, and de-duplicate arbitrary (including non-music) files. In order to realize these performance benefits in beets, library import will need to be modified to make use this method.

N.B. The degree of batching chosen for bulk_add is a trade-off between buffering delay, CPU time, and peak memory usage. I've found negligible memory usage for single transactions containing tens of thousands of model records.

… transaction

sampsyo · 2019-06-07T14:28:06Z

Cool! Thanks for looking into this!!

Here's one thing I don't quite understand about this profiling work: does the overhead come from literally creating SQLite transactions, or is it our in-Python cost of looking up the transactions? We built the transaction system in dbcore to avoid the creation of redundant transactions. So if you do this:

with db.transaction():
    with db.transaction():
        db.mutate(...)

…only one SQLite transaction should actually be created. More relevantly, if you do this:

with db.transaction():
    for item in items:
        db.add(...)

…then the transaction created inside the add call should approximately be a no-op. Of course, lookup up the transaction stack for the current thread is not free! So it's possible the overhead is coming from there.

But just to confirm: did you measure the performance difference between an unwrapped add loop, a transaction-wrapped add loop, and the new bulk_add that doesn't bother creating a transaction at all? The relative differences between these differences would be enlightening…

khimaros · 2019-06-07T16:28:16Z

@sampsyo -- indeed, performance seems to be nearly identical with the method you describe here!

I sat on this patch for a very long time and it seems the landscape may have changed. For a bit of background, when we spoke about this last October the conclusion was:

Yes, absolutely! A bulk insert would be a great way to do it. You could even imagine letting the bulk_add method accept an iterable (instead of just a list) to let new model objects get generated on the fly without materializing them all in memory.

It's possible that nested transactions were not handled gracefully back then.

If this PR doesn't add any value, let's just close it and I will switch my dbcore clients to the nested transaction solution. I'll leave the final judgment to you. If it does add value, it should have some tests. I have some staged in my local repo for the performance testing.

Thank you for continuing to build and maintain this excellent tool!

sampsyo · 2019-06-07T18:58:59Z

Yeah! I read that comment of mine too with great interest. It's constantly surprising to me how hard it is for me to reconstruct my own thinking from a previous era. Reading my comments, it looks like I may have been imagining that the overhead came from SQLite execute calls, and that such a bulk insertion would use one big statement? Who can really say what I was thinking, I guess.

If things are just as fast with the explicit transaction approach, I say we probably don't need bulk_add. But it might be worth documenting this phenomenon somewhere, especially if other people are going to be as intrepid as you and write their own dbcore clients!

khimaros · 2019-06-08T04:04:22Z

There is definitely no performance reason to favor bulk_add over nested transaction:

$ nosetests -v --with-timer test.test_dbcore_perf
test_dbcore_bulk_add (test.test_dbcore_perf.PerformanceTest) ... ok (34.5063s)
test_dbcore_single_transaction (test.test_dbcore_perf.PerformanceTest) ... ok (33.9157s)

[success] 50.43% test.test_dbcore_perf.PerformanceTest.test_dbcore_bulk_add: 34.5063s
[success] 49.57% test.test_dbcore_perf.PerformanceTest.test_dbcore_single_transaction: 33.9157s
----------------------------------------------------------------------
Ran 2 tests in 68.423s

Closing this PR now, thanks for the feedback!

implement bulk_add for applying several SQLite operations in a single…

0df584e

… transaction

khimaros closed this Jun 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement bulk_add for applying several SQLite operations per transaction #3300

implement bulk_add for applying several SQLite operations per transaction #3300

khimaros commented Jun 7, 2019

sampsyo commented Jun 7, 2019

khimaros commented Jun 7, 2019

sampsyo commented Jun 7, 2019

khimaros commented Jun 8, 2019

implement bulk_add for applying several SQLite operations per transaction #3300

implement bulk_add for applying several SQLite operations per transaction #3300

Conversation

khimaros commented Jun 7, 2019

sampsyo commented Jun 7, 2019

khimaros commented Jun 7, 2019

sampsyo commented Jun 7, 2019

khimaros commented Jun 8, 2019