Small performance optimization when fetching items from library. #2798

pprkut · 2018-01-27T17:32:44Z

Fetch flexible attributes once for all relevant items instead of once per item.
For queries with larger result sets this drastically cuts down the number of issued sqlite queries.

In the end, this didn't quite give the performance boost I'd hoped it would, but it feels still slightly faster for larger queries.
Unfortunately the hot spot seems to be somewhere else though :-/

One thing to note, since the list of IDs used in the WHERE..IN clause can be quite large (and reach sqlite's query parameter limit), I resorted to not passing them as parameters.
This should be fine, since these are not values originating from input but coming from the database from a previous query. Not sure how big of an issue this is nonetheless.

sampsyo

Thank you for looking into this! It does seem like a much better idea to fetch everything up front instead of looping over each item and then issuing a separate query. 👍 👍 👍

I have a couple of questions within, including a follow-up on your concerns about the list of IDs in the SELECT query.

Have you run any simple performance comparisons (just time on a query or two) to see what the impact is, even if it didn't solve the main bottleneck?

beets/dbcore/db.py

pprkut · 2018-02-22T19:44:12Z

Finally had some time to rerun my performance tests under at least somewhat untainted circumstances.
I checked with time beet ls &> /dev/null, which for me is 11240 tracks. I have the db on a SSD, not sure that has a lot of impact, but mentioning it just to be aware.

Without the patch

real    0m9.766s
user    0m9.500s
sys     0m0.266s

With the patch

real    0m9.413s
user    0m9.210s
sys     0m0.203s

Both are fastest runs out of 5. Based on this I'd say on average it gives a 4% speedup.

Edit: Forgot to mention, a lof of that time seems to be in the actual output generation. Loading the entire collection in roquette takes about 3s, so about 66% less. Given that, the 0.4s speedup doesn't sound that bad actually.

sampsyo · 2018-02-22T21:03:51Z

Indeed! Thanks for doing the measurements—that's really helpful to know. And it helps point the way to the next bottleneck too (formatting the output).

pprkut · 2018-11-28T10:01:56Z

Finally had time to do a revision on this. I moved the flex attr query to _fetch(), sort of like you already suggested. That way we still only query once, but we can construct a better subquery that no longer has to worry about a length limitation or SQL injection possibilities. I think this looks good now :-)

I did some further measurements, to get the actual raw performance of iterating the entire library (11832 items):

start = time.time()
for item in lib.items(query):
    continue
end = time.time()

On master this takes 3.66s average
Using a single flex attr query this goes down to 2.95s average
I have another optimization potential that I'll look into next to get this further down to 1.41s, but it'll need some more verification.

I also see some unit test failures in travis that I don't really understand (I don't get those locally), so if someone has some pointers on what to do about those I'd really appreciate it.

sampsyo · 2018-12-01T22:27:38Z

Nice!! I agree this is looking good—I'm convinced this is the way to go, and the code changes are nice and clean. Awesome work!!!

I'm also pretty confused by the test failures. I am able to reproduce them locally (for example, by running nose test/test_metasync.py). Looking into it now.

sampsyo

OK, I think I see the problem with those tests. The issue arises when querying for model objects that have no flexattrs. In this case, _get_indexed_flex_attrs does not have an entry for the object (even an empty one). So trying to look up its flexattrs raises a KeyError.

I've proposed an easy fix in an above inline comment. One more complicated fix would be to change _get_indexed_flex_attrs to produce empty entries for other model objects, but that's probably not a good idea.

To reason this comes up in the tests for metasync, for what it's worth, is the way the harness sets up its fixtures:

beets/test/test_metasync.py

Lines 62 to 71 in 17d9882

    
           items[0].title = 'Tessellate' 
        
           items[0].artist = 'alt-J' 
        
           items[0].albumartist = 'alt-J' 
        
           items[0].album = 'An Awesome Wave' 
        
           items[0].itunes_rating = 60 
        
           items[1].title = 'Breezeblocks' 
        
           items[1].artist = 'alt-J' 
        
           items[1].albumartist = 'alt-J' 
        
           items[1].album = 'An Awesome Wave'

There's one item with a flexattr and one without.

beets/dbcore/db.py

pprkut · 2018-12-02T11:21:41Z

Thanks for the explanation. All makes sense now :)
I fixed up the tests so this should be good to go now

pprkut · 2019-03-23T12:15:06Z

Hey! Just checking, this is good to go right? If so I'll go ahead and merge it

sampsyo · 2019-03-23T17:23:40Z

Yes; sorry for not responding! This looks good to go. It's a small change but a very important one! I say merge away.

Fetch flexible attributes once for all items instead of once per item. For queries with larger result sets this drastically cuts down the number of issued sqlite queries.

This resolves the query length and potential security problems, while keeping the performance benefits.

pprkut force-pushed the query_perf branch from ad38bca to 50410c6 Compare January 27, 2018 17:44

sampsyo reviewed Jan 31, 2018

View reviewed changes

beets/dbcore/db.py Outdated Show resolved Hide resolved

beets/dbcore/db.py Outdated Show resolved Hide resolved

beets/dbcore/db.py Show resolved Hide resolved

pprkut force-pushed the query_perf branch from 50410c6 to c570980 Compare November 28, 2018 09:47

sampsyo reviewed Dec 1, 2018

View reviewed changes

beets/dbcore/db.py Outdated Show resolved Hide resolved

pprkut force-pushed the query_perf branch from cad4f30 to cae633c Compare December 2, 2018 11:05

pprkut mentioned this pull request Dec 5, 2018

Performance improvement: Type cast properties on demand #3089

Merged

pprkut added 4 commits March 23, 2019 20:22

Small performance optimization when fetching items from library.

1e1ddd2

Fetch flexible attributes once for all items instead of once per item. For queries with larger result sets this drastically cuts down the number of issued sqlite queries.

Query flexible attributes already in _fetch()

31ec222

This resolves the query length and potential security problems, while keeping the performance benefits.

Improve documentation

5494c8b

Account for items that do not have flexible attributes

7c568aa

pprkut force-pushed the query_perf branch from cae633c to 7c568aa Compare March 23, 2019 19:22

pprkut merged commit 29425ed into beetbox:master Mar 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Small performance optimization when fetching items from library. #2798

Small performance optimization when fetching items from library. #2798

pprkut commented Jan 27, 2018

sampsyo left a comment

pprkut commented Feb 22, 2018 •

edited

Loading

sampsyo commented Feb 22, 2018

pprkut commented Nov 28, 2018

sampsyo commented Dec 1, 2018

sampsyo left a comment

pprkut commented Dec 2, 2018

pprkut commented Mar 23, 2019

sampsyo commented Mar 23, 2019

	items[0].title = 'Tessellate'
	items[0].artist = 'alt-J'
	items[0].albumartist = 'alt-J'
	items[0].album = 'An Awesome Wave'
	items[0].itunes_rating = 60

	items[1].title = 'Breezeblocks'
	items[1].artist = 'alt-J'
	items[1].albumartist = 'alt-J'
	items[1].album = 'An Awesome Wave'

Small performance optimization when fetching items from library. #2798

Small performance optimization when fetching items from library. #2798

Conversation

pprkut commented Jan 27, 2018

sampsyo left a comment

Choose a reason for hiding this comment

pprkut commented Feb 22, 2018 • edited Loading

sampsyo commented Feb 22, 2018

pprkut commented Nov 28, 2018

sampsyo commented Dec 1, 2018

sampsyo left a comment

Choose a reason for hiding this comment

pprkut commented Dec 2, 2018

pprkut commented Mar 23, 2019

sampsyo commented Mar 23, 2019

pprkut commented Feb 22, 2018 •

edited

Loading